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NEISSERIAL ANTIGENS 



PATENT 



This application is a continuation-in-part of international patent application PCT/IB98/01665, filed 
October 9, 1998, from which priority is claimed under 35 U.S.C. § 1 19. 

This invention relates to antigens from Neisseria bacteria. 

5 BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
are pathogenic in humans. N. meningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); N gonorrhoeae colonises the genital tract 

^ and causes gonorrhea. Although colonising different areas of the body and causing completely 

10 different diseases, the two pathogens are closely related, although one feature that clearly 

(3 differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 

ij5 present in all pathogenic meningococci. 

W N. gonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 

P United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 

la 15 New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 
y New York, 1997, pp.817-842). The disease causes significant morbidity but limited mortality. 

0 Vaccination against N gonorrhoeae would be highly desirable, but repeated attempts have failed. 

The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
20 protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra), 

N meningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman 
et ah (1996) Safety and Immunogenicity of a Serogroups A/C Neisseria meningitidis 
25 Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(19): 1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. NEnglJMed 337(14):970- 
976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely high, at 
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10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N. meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 

Based on the organism's capsular polysaccharide, 12 serogroups of N, meningitidis have been 
5 identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 
the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
10 efficacious in adolescents and adults, it induces a poor immune response and short duration of 
5 protection, and cannot be used in infants [eg. Morbidity and Mortality weekly report, Vol.46, No. 

W RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 

ijj immune response that cannot be boosted by repeated immunization. Following the success of the 

^ vaccination against H.influenzae, conjugate vaccines against serogroups A and C have been 

m 1 5 developed and are at the final stage of clinical testing (Zollinger WD "New and Improved Vaccines 
O Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 

y al (1996) supra; Costantino et al (1992) Development and phase I clinical testing of a conjugate 

^ vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
20 approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer 
of cc(2-8)-linked Af-acetyl neuraminic acid that is also present in mammalian tissue. This results in 
tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
25 immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the A^-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoorn (1994) Current status of Meningococcal group B vaccine 
candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575), 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
30 (OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
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OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
5 porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect. Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg. Ala'Aldeen & Borriello (1996) The meningococcal transferrin-binding proteins 1 
and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
10 and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
are presumed targets for the immune system and which are not antigenically variable. For instance, 
15 some of the identified proteins could be components of efficacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisseriae. 

THE INVENTION 

The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
20 examples. These sequences relate to N. meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
homologous proteins include mutants and allelic variants of the sequences disclosed in the 
25 examples. Typically, 50% identity or more between two proteins is considered to be an indication 
of functional equivalence. Identity between the proteins is preferably determined by the 
Smith-Waterman homology search algorithm as implemented in the MPSRCH program (Oxford 
Molecular), using an affine gap search with parameters gap open penalty=12 and gap extension 
penalty- 1. 
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The invention further provides proteins comprising fragments of the Neisserial amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 

The proteins of the invention can, of course, be prepared by various means (eg. recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forms (eg. native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
free from other Neisserial or host cell proteins) 

According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
sequences disclosed in the examples. 

Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65°C in a 0. lxSSC, 
0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, n is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fragments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
complementary to those described above (eg. for antisense or probing purposes). 
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Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
those containing modified backbones, and also peptide nucleic acids (PNA) etc. 

According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
for instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use 
as medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as N. gonorrhoeae, or any strain of N. meningitidis, such as strain A, strain 
B or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means. 
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A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting 
5 an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 

A summary of standard techniques and procedures which may be employed in order to perform the 
invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
10 are not required. 

General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 

15 Molecular Cloning; A Laboratory Manual Second Edition (1989); DNA Cloning, Volumes land 
ii (D.N Glover ed. 1985); Oligonucleotide Synthesis (MJ. Gait ed, 1984); Nucleic Acid 
Hybridization (BIX Hames & S J. Higgins eds. 1984); Transcription and Translation (B.D. Hames 
& S.J. Higgins eds. 1984); Animal Cell Culture (R.I. Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 

20 Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 
Transfer Vectors for Mammalian Cells (J.H. Miller and MJP. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, 
Second Edition (Springer-Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 

25 I-IV (D.M. Weir and C. C. Blackwell eds 1986). 

Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 
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Definitions 

A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

5 The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 

A "conserved" Neisseria amino acid fragment or protein is one that is present in a particular 
Neisserial protein in at least x% of Neisseria. The value of x may be 50% or more, e.g., 66%, 75%, 
80%, 90%, 95% or even 100% (i.e. the amino acid is found in the protein in question in all 

10 Neisseria). In order to determine whether an animo acid is "conserved" in a particular Neisserial 
protein, it is necessary to compare that amino acid residue in the sequences of the protein in 
question from a plurality of different Neisseria (a reference population). The reference population 
may include a number of different Neisseria species or may include a single species. The reference 
population may include a number of different serogroups of a particular species or a single 

15 serogroup. A preferred reference population consists of the 5 most common NeisseriaThc term 
"heterologous" refers to two biological components that are not found together in nature. The 
components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 

20 Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 

25 unit of polynucleotide replication within a cell, capable of replication under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS -7 

30 cells. 
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A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 
5 Smith- Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 
10 a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
15 for example those used with mammalian cells, baculo viruses, plants, bacteria, and yeast, 

i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 

20 transcription initiating region, which is usually placed proximal to the 5' end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 

25 determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual, 2nd ed.J. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include 
30 the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late 
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promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

5 The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with 
synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 

10 tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 236:1237; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO J, 4:761] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 

15 [Gorman et al. (1982b) Proc. Natl Acad. Sci. 79:6777] and from human cytomegalovirus [Boshart 
et al. (1985) Cell 41:S2\\ Additionally, some enhancers are regulatable and become active only 
in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
20 directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 

25 provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 

30 in mammalian cells. 
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Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 47:349; 
5 Proudfoot and Whitelaw (1988) 'Termination and 3' end processing of eukaryotic RNA. In 
Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. 
Set 74:105]. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
1 0 mammalian cells." In Molecular Cloning: A Laboratory Manual], 

Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 

15 an extrachromosomal element {eg. plasmids) capable of stable maintenance in a host, such as 
mammalian cells or bacteria. Mammalian replication systems include those derived from animal 
viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 23:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 

20 antigen. Additional examples of mammalian replicons include those derived from bovine 
papillomavirus and Epstein-Barr virus, Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufinan et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. 

25 (1986) Mol Cell Biol 6: 1 074] . 

The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
30 microinjection of the DNA into nuclei. 
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Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells (eg. Hep G2), and a 
5 number of other cell lines. 
ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression 
vector, and is operably linked to the control elements within that vector. Vector construction 
employs techniques which are known in the art. Generally, the components of the expression 
1 0 system include a transfer vector, usually a bacterial plasmid, which contains both a fragment of the 
O baculovirus genome, and a convenient restriction site for insertion of the heterologous gene or 

y genes to be expressed; a wild type baculovirus with a sequence homologous to the baculovirus- 

|Tt specific fragment in the transfer vector (this allows for the homologous recombination of the 

y - heterologous gene in to the baculovirus genome); and appropriate insect host cells and growth 

ffl 15 media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
W wild type viral genome are transfected into an insect host cell where the vector and viral genome 

5 are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 

^ are identified and purified. Materials and methods for baculovirus/insect cell expression systems 

20 are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 

These techniques are generally known to those skilled in the art and fully described in Summers 

and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 

and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
25 described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector), This construct may contain a single gene and operably linked regulatory 
elements; multiple genes, each with its owned set of operably linked regulatory elements; or 
multiple genes, regulated by the same set of regulatory elements. Intermediate transplacement 
30 constructs are often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) 
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capable of stable maintenance in a host, such as a bacterium. The replicon will have a replication 
system, thus allowing it to be maintained in a suitable host for cloning and amplification. 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 77:31. 

The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol., 42:111) and a prokaryotic ampicillin-resistance (amp) gene and origin of 
replication for selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 
(5 ! to 3*) transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5 f end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 
regulated or constitutive. 

Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: 
The Molecular Biology of Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
476; and the gene encoding the pi 0 protein, Vlak et al., (1988), J. Gen. Virol 69:165. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
73:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 
insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
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those derived from genes encoding human a-interferon, Maeda et al., (1985), Nature 375:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec, Cell Biol 8:3129; 
human IL-2, Smith et al., (1985) Proc. Nat'l Acad, Set USA, 52:8404; mouse IL-3, (Miyajima et 
al., (1987) Gene 58:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
5 be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
containing suitable translation initiation signals preceding an ATG start signal. If desired, 
1 0 methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised 
of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
1 5 leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
vector and the genomic DNA of wild type baculovirus — usually by co-transfection. The promoter 

20 and transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al., Mol Cell Biol (1983) 3:2156; and Luckow and Summers (1989)). For example, the insertion 
can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 

25 insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays 4:91.The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3 f by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 
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The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
5 system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 
which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 Jim in size, are highly retractile, 
giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 

10 infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 
wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology 1 ' Vol. 2 (Ausubel et al. eds) at 16.8 

15 (Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti 
, Autographa californica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) J. Virol 56:153; Wright (1986) Nature 
20 327:718; Smith et ah, (1983) Mol Cell Biol 3:2156; and see generally, Fraser, et al (1989) In 
Vitro Cell Dev. Biol 25:225), 

Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

25 The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
stable maintenance of the plasmid(s) present in the modified insect host. Where the expression 
product gene is under inducible control, the host may be grown to high density, and expression 
induced. Alternatively, where expression is constitutive, the product will be continuously expressed 
into the medium and the nutrient medium must be continuously circulated, while removing the 

30 product of interest and augmenting depleted nutrients. The product may be purified by such 
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techniques as chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, 
etc.; electrophoresis; density gradient centrifiigation; solvent extraction, or the like. As appropriate, 
the product may be further purified, as required, so as to remove substantially any insect proteins 
which are also secreted in the medium or result from lysis of insect cells, so as to provide a product 
5 which is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 

10 iii. Plant Systems 

There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 

15 of plant protein signal peptides may be found in addition to the references described above in 
Vaulcombe et al., Mol Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, J. Biol Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al., Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al., Molecular 
Microbiology 3:3-14 (1989); Yu et al, Gene 122:247-253 (1992). A description of the regulation 

20 of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 
gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 
Physiology^ Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBOJ. 9:3447-3452 (1990); Benkel and Hickey, Proa Natl. Acad. Set 

25 84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
30 companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
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vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 
desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
5 Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 
general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, 1993, Plant Mol Biol Reptr, 1 1(2): 165-1 85. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
10 are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

1 5 The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 

20 equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 

A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 

25 in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 
transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 

30 secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 
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seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
5 out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" 
region may be conducted to prevent losing a portion of the genetic message as a false intron code, 
Reed and Maniatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol Gem Genet, 202:179-185, 1985. The genetic 

10 material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 
Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 

15 transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 
entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proc. 
Natl Acad, Set USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
20 presence of plasmids containing the gene construct. Electrical impulses of high field strength 
reversibly permeabilize biomembranes allowing the introduction of the plasmids, Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 

All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 

25 transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 
other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus,- Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 

30 Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Cichorium, 
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Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura, 

Means for regeneration vary from species to species of plants, but generally a suspension of 
5 transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryo 
formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
10 to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 
history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
1 5 alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
embryo less-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
20 purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of 

25 a coding sequence {eg. structural gene) into mRNA. A promoter will have a transcription initiation 
region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 

30 negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 
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thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5') 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
5 activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 
coli) [Raibaud et al (1984) Annu. Rev. Genet. J 8:173], Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 
Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 

10 lactose (lac) [Chang et al (1977) Nature 795:1056], and maltose. Additional examples include 
promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et ah 
(1980) Nuc. Acids Res. 5:4057; Yelverton et al (1981) Nucl Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bid) promoter system 
[Weissmann (1981) "The cloning of interferon and other mistakes." In Interferon 3 (ed. I. Gresser)], 

15 bacteriophage lambda PL [Shimatake et al (1981) Nature 292:128] and T5 [US patent 4,689,406] 
promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 

20 synthetic hybrid promoter [US patent 4,55 1,433]. For example, the tac promoter is a hybrid trp-lac 
promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al (1983) Gene 25:167; de Boer et al (1983) Proc. Natl Acad. ScL 50:21]. 
Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 

25 occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
polymerase/promoter system is an example of a coupled promoter system [Studier et al (1986) J. 
Mol Biol 759:113; Tabor et al (1985) Proc Natl Acad. ScL 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO- 

30 A-0 267 851). 
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In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
the expression of foreign genes in prokaryotes. In E, colU the ribosome binding site is called the 
Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-1 1 nucleotides upstream of the initiation codon [Shine et al (1975) 
5 Nature 254:34], The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al (1979) 
"Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al (1989) "Expression of cloned 
10 genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual], 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
1 5 on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end 
of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 

20 terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene 
[Nagai et al (1984) Nature 50P:81O], Fusion proteins can also be made with sequences from the 
lacZ [Jia et al (1987) Gene 60:197], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al 
(1989) J. Gen. Microbiol 135:1 1], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 

25 junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably 
retains a site for a processing enzyme (eg. ubiquitin specific processing-protease) to cleave the 
ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[Miller et al (1989) Bio/Technology 7:698]. 
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Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336], The signal sequence fragment usually encodes 
a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from 
5 the cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the 
periplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria). 
Preferably there are processing sites, which can be cleaved either in vivo or in vitro encoded between the 
signal peptide fragment and the foreign gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
10 such as the E. coli outer membrane protein gene (ompA) [Masui et al. (1983), in: Experimental 
Manipulation of Gene Expression', Ghrayeb et al. (1984) EMBO J. 3:2437] and the E. coli alkaline 
phosphatase signal sequence (phoA) [Oka et al (1985) Proc. Natl Acad, Set 52:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 
can be used to secrete heterologous proteins from B. subtilis [Palva et al. (1982) Proc. Natl Acad. 
15 Sci. USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3' to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
20 about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the trp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
coding sequence of interest, and transcription termination sequence, are put together into expression 

25 constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element {eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a replication system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 

30 from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
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number plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
5 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
1 0 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
15 [Davies et al (1978) Annu. Rev. Microbiol 32:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

20 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al (1982) Proa 
Natl. Acad. Set USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et al (1981) Nature 292:128; Amann et al. (1985) Gene 40:183; Studier et al. 

25 (1986) 1 Mol Biol 759:113; EP-A-0 036 776JEP-A-0 136 829 and EP-A-0 136 907], 
Streptococcus cremoris [Powell et al. (1988) Appl Environ. Microbiol 54:655]; Streptococcus 
lividans [Powell et al (198$) Appl. Environ. Microbiol 54:655], Streptomyces lividans [US patent 
4,745,056]. 
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Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 
include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
5 [Masson et al. (1989) FEMS Microbiol Lett. 60:273; Palva et al (1982) Proc. Natl Acad. Set USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al (1988) 
Proc. Natl Acad. Scl 55:856; Wmgetal (1990) J. Bacteriol 772:949, Campylobacter], [Cohen 
et al (1973) Proc. Natl Acad. Sci. 69:2110; Dower et al (1988) Nucleic Acids Res. 7(5:6127; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl -derived 

10 plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al (1970) J. Mol Biol. 55:159; Taketo 
(1988) Biochim. Biophys. Acta 949:318; Escherichia], [Chassy et al (1987) FEMS Microbiol. Lett. 
44:113 Lactobacillus]; [Fiedler et al (1988) Anal Biochem 1 70:38, Pseudomonas]; [Augustin et 
al (1990) FEMS Microbiol Lett. 66:203, Staphylococcus], [Barany et al (1980) J. Bacteriol 

15 144:698; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et al (1981) Infect. Immun. 
52:1295; Powell et al (1988) Appl Environ. Microbiol. 54:655; Somkuti et al (1987) Proc. 4th 
Evr. Cong. Biotechnology 7:412, Streptococcus]. 

v. Yeast Expression 

20 Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 

25 Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 
The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

30 Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
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include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PHOS gene, encoding acid phosphatase, also provides useful promoter sequences 
5 [Myanohara et al (1983) Proc. Natl Acad. Set USA 80:1]. 

In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 

1 0 (US Patent Nos. 4,876, 1 97 and 4,880,734). Other examples of hybrid promoters include promoters 
which consist of the regulatory sequences of either the ADH2, GAL4, GAL10, OR PHOS genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 

15 Examples of such promoters include, inter alia, [Cohen et al. (1980) Proc. Natl. Acad. Sci. USA 
77:1078; Henikoff et al. (1981) Nature 283:S35; Hollenberg et al. (1981) Curr Topics Microbiol 
Immunol 96:119; Hollenberg et al (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al (1980) Gene 

20 77:163; Panthier et al (1980) Curr. Genet. 2:109;]. 

A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
25 cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculo virus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5 f end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
30 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
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linked at the 5 r terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 
5 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded 
1 0 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
15 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
20 acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha- factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 

25 Usually, transcription termination sequences recognized by yeast are regulatory regions located 3' 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
termination sequences, such as those coding for glycolytic enzymes. 
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Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 
5 replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et ah (1979) Gene 5:17-24], pCl/1 [Brake et ah 
(1984) Proc. Natl Acad. Sci USA 57:4642-4646], and YRpl7 [Stinchcomb et ah (1982) J. Mol 
Biol 755:157]. In addition, a replicon may be either a high or low copy number plasmid. A high 
10 copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 10 to about 150. A host containing a high copy number plasmid will preferably have 
at least about 10, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et ah, supra, 

1 5 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et ah (1983) Methods in 

20 Enzymol 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr- Weaver et ah, 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et ah (1983) Proc. Natl Acad. Set USA 50:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

25 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
chromosome and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 
30 include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, 
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TRP1, and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUP J allows yeast to grow in the presence of copper ions [Butt et al (1987) Microbiol 
5 Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
10 have been developed for transformation into many yeasts. For example, expression vectors have 
been developed for, inter alia, the following yeasts:Candida albicans [Kurtz, et al. (1986) Mol. 
Cell Biol 6:142], Candida maltosa [Kunze, et al (1985) 1 Basic Microbiol 25:141]. Hansenula 
polymorpha [Gleeson, et al (1986) J. Gen. Microbiol 732:3459; Roggenkamp et al. (1986) Mol 
Gen. Genet. 202:302], Kluyveromyces fragilis [Das, et al (1984) J. Bacteriol 755:1165], 
15 Kluyveromyces lactis [De Louvencourt et al (1983) J. Bacteriol 154:731; Van den Berg et al 
(1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et al (1985) J. Basic Microbiol 
25:141], Pichia pastoris [Cregg, et al (1985) Mol Cell Biol 5:3376; US Patent Nos. 4,837,148 
and 4,929,555], Saccharomyces cerevisiae [Hinnen et al (1978) Proc. Natl Acad. Sci. USA 
75:1929; Ito et al (1983) J. Bacteriol 753:163], Schizosaccharomyces pombe [Beach and Nurse 
20 (1981) Nature 500:706], and Yarrowia lipolytica [Davidow, et al (1985) Curr. Genet 70:380471 
Gaillardin, etal (1985) Curr. Genet. 70:49], 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 

include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 

Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
25 et al (1986) Mol Cell Biol. 6:142; Kunze et al. (1985) J. Basic Microbiol 25:141; Candida]; 

[Gleeson et al (1986) J. Gen. Microbiol 732:3459; Roggenkamp et al (1986) Mol Gen. Genet. 

202:302; Hansenula]; [Das et al (1984) J. Bacteriol 158:1 165; De Louvencourt et al (1983) J. 

Bacteriol 754:1165; Van den Berg et al. (1990) Bio/Technology 5:135; Kluyveromyces]; [Cregg 

et al. (1985) Mol Cell Biol 5:3376; Kunze et al (1985) J. Basic Microbiol 25:141; US Patent 
30 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 75;1929; 
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Ito et al (1983) /. Bacteriol 153:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet 70:39; Gaillardin et al (1985) Curr. 
Genet 10:49; Yarrowia]. 

Antibodies 

5 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody'* 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 
10 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably 

1 5 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 jag/injection 

20 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 

25 the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 
recovered by centrifiigation (eg. 1 ,000g for 1 0 minutes). About 20-50 ml per bleed may be obtained 
from rabbits. 



Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 
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above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
5 immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 
the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT"). The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
10 (and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 
cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P 
2 15 and I25 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 
are typically detected by their activity. For example, horseradish peroxidase is usually detected by 
its ability to convert 3,3 ! ,5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand 
molecule with high specificity, as for example in the case of an antigen and a monoclonal antibody 
20 specific therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and 
protein A, and the numerous receptor-ligand couples known in the art. It should be understood that 
the above description is not meant to categorize the various labels into distinct classes, as the same 
label may serve in several different modes. For example, J25 I may serve as a radioactive label or as 
an electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may 
25 combine various labels for desired effect. For example, MAbs and avidin also require labels in the 
practice of this invention: thus, one might label a MAb with biotin, and detect its presence with 
avidin labeled with 125 I, or with an anti-biotin MAb labeled with HRP. Other permutations and 
possibilities will be readily apparent to those of ordinary skill in the art, and are considered as 
equivalents within the scope of the instant invention. 
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Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

5 The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect, The effect can be detected by, for example, chemical markers or 
antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 
10 and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amount for a given situation can be determined by routine 
experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg 
15 or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is 
administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, 
such as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 

20 pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. 
Suitable carriers may be large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid 
copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill in 

25 the art, 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
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pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
Pub. Co.,NJ. 1991). 

Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying 
5 agents, pH buffering substances, and the like, may be present in such vehicles. Typically, the 
therapeutic compositions are prepared as injectables, either as liquid solutions or suspensions; solid 
forms suitable for solution in, or suspension in, liquid vehicles prior to injection may also be 
prepared. Liposomes are included within the definition of a pharmaceutically acceptable carrier. 

Delivery Methods 

10 Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
15 administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 
treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or 
20 therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
usually in combination with "pharmaceutically acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving the composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
25 polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
known to those of ordinary skill in the art. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be conjugated 
to a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, K pylori, etc. pathogens. 
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Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
(2) oil-in- water emulsion formulations (with or without other specific immunostimulating agents 
such as muramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
5 MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the subunit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 
formulated into submicron particles using a microfluidizer such as Model HOY microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 

1 0 blocked polymer L 1 2 1 , and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 
cell wall components from the group consisting of monophosphorylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 

15 adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IF A); (5) cytokines, such as interleukins {eg. 
IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons {eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances that 

20 act as immunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(r-2 , -dipalmitoyl-5 , «-glycero-3- 
25 hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions {eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, such 
as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 
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Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
enhanced adjuvant effect, as discussed above under pharmaceutical^ acceptable carriers. 

5 Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 
prevention. This amount varies depending upon the health and physical condition of the individual 
10 to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, eta), 
the capacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 
can be determined through routine trials. 

15 The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single dose 
schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 

20 immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et al (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

Gene Delivery Vehicles 

25 Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be administered 
either locally or systemically. These constructs can utilize viral or non- viral vector approaches in 
in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 
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mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
5 adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
picornavirus, poxvirus, or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1 :51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153. 

10 Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy 
vector is employable in the invention, including B, C and D type retroviruses, xenotropic 
retroviruses (for example, NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol 53:160) 
polytropic retroviruses eg. MCF and MCF-MLV (see Kelly (1983) 7. Virol 45:291), spumaviruses 
and Antiviruses. See RNA Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

15 Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
20 vector particles by introducing them into appropriate packaging cell lines (see US patent 
5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
25 in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells (eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 
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Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 
5 19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

10 Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 
WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 

15 (1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 
79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81 :6349; and Miller (1990) 
Human Gene Therapy 1 . 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. 

20 See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors 
employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 

25 WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 
Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3 :147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 

30 vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
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WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 
which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 

5 the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 
terminal repeat (ie. there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 

10 pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J. Virol. 61 :3096), Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 
Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and 
Muzyczka US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 

15 further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albumin promoter and directs expression predominantly in the liver. 
Its structure and construction are disclosed in Su (1996) Human Gene Therapy 7:463-470. 
Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 5,139,941, 
and US 5,252,479. 

20 The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1988) Science 241 : 1 667-1 669 and in WO90/09441 

25 and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 
and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
30 VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
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ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and 
WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
5 are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 
08/679640). 

DNA vector systems such as eukarytic layered expression systems are also usefiil for expressing 
1 0 the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

Other viral vectors suitable for use in the present invention include those derived from poliovirus, 
for example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) 

15 J. Biol Standardization 1:115; rhinovirus, for example ATCC VR-1 110 and those described in 
Arnold (1990) J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for 
• example ATCC VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl 
AcadSci 86:317; Flexner (1989) Ann NYAcadSci 569:86, Flexner (1990) Vaccine 8:17; in US 
4,603,1 12 and US 4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those 

20 described in Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza 
virus, for example ATCC VR-797 and recombinant influenza viruses made employing reverse 
genetics techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 
87:3802-3805; Enami & Palese (1991) J Virol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see 
also McMichael (1983) NEJ Med 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 

25 277:108); human immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) 
/. Virol 66:273 1; measles virus, for example ATCC VR-67 and VR-1247 and those described in 
EP-0440219; Aura virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 
and ATCC VR-1 240; Cabassou virus, for example ATCC VR-922; Chikungunya virus, for 
example ATCC VR-64 and ATCC VR-1 241; Fort Morgan Virus, for example ATCC VR-924; 

30 Getah virus, for example ATCC VR-369 and ATCC VR-1243; Kyzylagach virus, for example 
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ATCC VR-927; Mayaro virus, for example ATCC VR-66; Mucambo virus, for example ATCC 
VR-580 and ATCC VR-1244; Ndumu virus, for example ATCC VR-371; Pixuna virus, for 
example ATCC VR-372 and ATCC VR-1245; Tonate virus, for example ATCC VR-925; Triniti 
virus, for example ATCC VR-469; Una virus, for example ATCC VR-374; Whataroa virus, for 
5 example ATCC VR-926; Y-62-33 virus, for example ATCC VR-375; CWyong virus, Eastern 
encephalitis virus, for example ATCC VR-65 and ATCC VR-1242; Western encephalitis virus, for 
example ATCC VR-70, ATCC VR-1251, ATCC VR-622 and ATCC VR-1252; and coronavirus, 
for example ATCC VR-740 and those described in Hamre (1966) Proc Soc Exp Biol Med 121:190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
10 vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 
15 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in W092/1 1033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:241 1-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

20 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 

25 262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described 
in WO 90/1 1092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by 
30 the beads. The method may be improved further by treatment of the beads to increase 
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hydrophobicity and thereby facilitate disruption of the endosome and release of the DNA into the 
cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
5 delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 
vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to encapsulate 

10 DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 
promoters. Further non-viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl Acad. Sci. USA 
9 1 (24): 1 1581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 

15 for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
activating transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
20 Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
Biophys Acta 600: 1 ; Bayer (1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
149:119; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
25 be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs 
in the individual to which it is administered. 



Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
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of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneal^, intravenously or intramuscularly or delivered to the interstitial 
5 space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications {eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 
treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
10 in the art and described in eg. W093/14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 

Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
1 5 precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
known in the art. 

Polynucleotide and polypeptide pharmaceutical comp ositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
20 additional agents can be used with polynucleotide and/or polypeptide compositions. 

A.Polypeptides 

One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
25 factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 
other invasive organisms, such as the 17 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RII. 
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B.Hormones, Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 
CPolyalkylenes, Polysaccharides, etc. 
5 Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 
D.Lipids, and Liposomes 

10 The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
but will generally be around 1:1 (mg DNA:micromoles lipid), or more of lipid. For a review of the 
15 use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys. Acta. 1097:1-17; Straubinger (1983) Me;/*. Enzymol 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl Acad. ScL USA 
20 84:7413-7416); mRNA (Malone (1989) Proc. Natl. Acad. Sci. USA 86:6077-6081); and purified 
transcription factors (Debs (1990)7. Biol Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, N[l-2 ) 3-dioleyloxy)propyl]-N,N,N-triethylammonium 
(DOTMA) liposomes are available under the trademark Lipofectin, from GIBCO BRL, Grand 
Island, NY. (See, also, Feigner supra). Other commercially available liposomes include 
25 transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 
prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proc. Natl Acad. Sci. USA 75:4194-4198; WO90/11092 for a description of the synthesis 
of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 
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Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials 
include phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline 
(DOPC), dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), 
5 among others. These materials can also be mixed with the DOTMA and DOTAP starting materials 
in appropriate ratios. Methods for making liposomes using these materials are well known in the 
art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 

10 using methods known in the art. See eg. Straubinger (1983) Meth. Immunol. 101:512-527; Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl. Acad. Sci. USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl. Acad. Set USA 76:145; Fraley (1980) J. Biol. 

15 Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl. Acad. Sci. USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 

E.Lipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. 
20 Mutants, fragments, or fusions of these proteins can also be used. Also, modifications of naturally 
occurring lipoproteins can be used, such as acetylated LDL, These lipoproteins can target the 
delivery of polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are 
including with the polynucleotide to be delivered, no other targeting ligand is included in the 
composition. 

25 Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
identified. At least two of these contain several proteins, designated by Roman numerals, AI, All, 
AIV; CI, CII, CIII. 



30 



A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
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E apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and 
HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
5 261:12918; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 
65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
10 naturally occurring lipoproteins can be found, for example, in Meth. Enzymol 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 

15 Such methods are described in Meth. Enzymol (supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
443, Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 

20 Techniologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al PCT/US97/14465. 

F.Polycationic Agents 

Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

25 Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 
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The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyornithine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such 
as (XI 74, transcriptional factors also contain domains that bind DNA and therefore may be useful 
5 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIED contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
10 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene, 
Lipofectin™, and HpofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

15 Immunodiagnostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 

20 samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 

25 molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
by packaging the appropriate materials, including the compositions of the invention, in suitable 
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containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc.) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
5 bonding. Typically, one sequence will be fixed to a solid support and the other will be free in 
solution. Then, the two sequences will be placed in contact with one another under conditions that 
favor hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; 
reaction temperature; time of hybridization; agitation; agents to block the non-specific attachment 
of the liquid phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration 
10 of the sequences; use of compounds to increase the rate of association of sequences (dextran sulfate 
or polyethylene glycol); and the stringency of the washing conditions following hybridization. See 
Sambrook et al [supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
15 concentration should be chosen that is approximately 120 to 200°C below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al. at page 9.50. 

20 Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to 1 fig for a 
plasmid or phage digest to 10* 9 to 10" 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 

25 exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 ng of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10 8 cpm/fig. For a single-copy mammalian gene a conservative approach would start 
with 10 \ig of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 

30 using a probe of greater than 10 8 cpm/ng, resulting in an exposure time of -24 hours. 
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Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
5 the ionic strength and formamide content of the hybridization buffer. The effects of all of these 
factors can be approximated by a single equation: 

Tm= 81 + 16.6(log 10 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/w-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal Biochem. 138: 267-284). 

10 In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (ie. 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 

15 homologous with the immobilized fragment (as is frequently the case in gene family and 
interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

20 In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for 
a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 

25 wash conditions which are nonstringent. If non-specific bands or high background are observed 
after autoradiography, the filter can be washed at high stringency and reexposed, If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 
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Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
5 which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many different nucleotide sequences will 
encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
10 complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 

15 include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 

20 sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
25 nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 
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Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al. 
[J. Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al [Proc. Natl. Acad. Sci. USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
5 DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et 
10 al (1993) TIBTECH 11:384-386]. 

Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al [Meth. Enzymol 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
1 5 that does not hybridize to the sequence of the amplification target (or its complement) to aid with 
duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
20 generated by the polymerase, they can be detected by more traditional methods, such as Southern 
blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
25 and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed 
to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61, 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight markers. Arrows indicate the position of the main recombinant product or, 
5 in Western blots, the position of the main K meningitidis immunoreactive band. TP indicates 
N. meningitidis total protein extract; OMV indicates N. meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 
shows GST control data; a circle (•) shows data with recombinant N. meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
10 AMPHI analysis (lower). The AMPHI program has been used to predict T-cell epitopes [Gao et 
al (1989) J. Immunol 143:3007; Roberts et al (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
al (1992) Scand J Immunol suppl. 1 1 :9) and is available in the Protean package of DNASTAR, Inc. 
(1228 South Park Street, Madison, Wisconsin 53715 USA). 

Figure 21 shows an alignment comparison of amino acid sequences for ORF 4 for several strains 
15 of Neisseria. Dark shading indicates regions of homology, and gray shading indicates the 
conservation of amino acids with similar characteristics. The Figure demonstrates a high degree 
of conservation among the various strains, further confirming its utility as an antigen for both 
vaccines and diagnostics. 

20 EXAMPLES 

The examples describe nucleic acid sequences which have been identified in N. meningitidis, along 
with their putative translation products, and also those of N. gonorrhoeae. Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

The examples are generally in the following format: 
25 • a nucleotide sequence which has been identified in N. meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N. meningitidis (strain A) and in 
N. gonorrhoeae 
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• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 

The examples typically include details of sequence identity between species and strains. Proteins 
5 that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
known function is widely used as a guide for the assignment of putative protein function to a new 
sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
10 algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
Altschul et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

15 To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemented at http://www.genome.ou.edu/gono_blast.html The FASTA algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

Dots within nucleotide sequences (eg. position 495 in SEQ ID 1 1) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
20 underlined nucleotides were removed. Lower case letters (eg. position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 
experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of hydrophobic 
25 domains using an algorithm based on the statistical studies of Esposti et al. [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These domains 
represent potential transmembrane regions or hydrophobic leader sequences. 
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Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
5 domains were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 

Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question ie. the protein is an 
1 0 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg. in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
(eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

15 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

K meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centrifiigation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, pH8). 

20 After 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (50mM 
NaCl, 1% Na-Sarkosyl, 50|ig/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifiigation. The pellet was washed once with 70% 

25 ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 
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B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
5 signal peptides were omitted, by deducing the 5 ' -end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoRl-Nhel, depending on the gene's own restriction pattern); the 3' primers 
included a Xhol restriction site. This procedure was established in order to direct the cloning of 
10 each amplification product (corresponding to each ORF) into two different expression systems: 
pGEX-KG (using either BamHl-Xhol or EcoRl-Xhol), and pET21b+ (using either Ndel-Xhol or 
Nhel-Xhol), 

5'-end primer tail: CGC GGATCCCATATG {BamHl-Ndel ) 

CGC GGATCCGCTAGC (BarnHl-Nhel) 
15 CCG GAATTC TA GCTAGC (EcoRI-Nhel) 

3'-end primer tail: CCCG CTCGAG (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
the same 3' Xhol primer was used as before: 

20 5'-end primer tail: G G AA T T C CAT AT G G C CAT G G (Ndel) 

5 '-end primer tail: CGGGATCC (BamHl) 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino-terminus His-tag 
fusion. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
BamHl restriction sites were incorporated using primers: 

25 5 '-end primer tail: GATCA GCTAGC CATATG (NheT) 

3'-end primer tail: CG GGATCC (BamHl) 
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As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 
5 T m = 4 (G+C)+ 2 (A+T) (tail excluded) 

T m = 64.9 + 0.41 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70°C for the whole oligo and 
50-55°C for the hybridising region alone. 

Table I shows the forward and reverse primers used for each amplification. In certain cases, it will 
10 be noted that the sequence of the primer does not exactly match the sequence in the ORF. When 
initial amplifications were performed, the complete 5' and/or 3' sequence was not known for some 
meningococcal ORFs, although the corresponding sequences had been identified in gonococcus. 
For amplification, the gonococcal sequences could thus be used as the basis for primer design, 
altered to take account of codon preference. In particular, the following codons were changed: 
1 5 ATA->ATT; TCG-»TCT; CAG-»CAA; AAG->AAA; GAG-»GAA; CGA->CGC; CGG-»CGC; 
GGG-»GGC. Italicised nucleotides in Table I indicate such a change. It will be appreciated that, 
once the complete sequence has been identified, this approach is generally no longer necessary. 

TABLE I - PCR primers 



ORF 


Primer 


Sequence 


Restriction sites 


ORF1 


Forward 


CGCGGATCCGCTAGC-GGACACACTTATTTCGG <SEQ ID 
924> 


BamHI-Nhel 




Reverse 


CCCGCTCGAG-CCAGCGGTAGCCTAATT <SEQ ID 92 5> 


Xhol 


ORF 2 


Forward 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG <SEQ ID 926> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-GACGGCATAACGGCG <SEQ ID 92 7 > 


Xhol 


ORF 2-1 


Forward 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG <SEQ ID 
928> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-TGATTTACGGACGCGCA <SEQ ID 929> 


Xhol 


ORF 4 


Forward 


GCGGATCCCATATG-TGCGGAGGTCAAAAAGAC <SEQ ID 
930> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-TTTGGCTGCGCCTTC <SEQ ID 931> 


Xhol 
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ORF5 


Forward 
Forward 


GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC <SEQ ID 
932> 

CGGGATCC-ATGGAAGGCGCACAAC <SEQ ID 93 3> 


Ndel-Ncol 
BamHI 






CCCGCTCGAG-GACTGTGCAAAAACGG <SEQ ID 934> 


Xhol 


ORF6 


Forward 
Reverse 


CGCGGATCCCATATG-ACLbbibAiUbibibbn ^ony ±u 
935> 

CCCGCTCGAG-TGCGLbbAAbAb I i I b vbrLy iu yoo^ 


Joanirii-rsoei 
Anoi 


ORF7 


Forward 
Reverse 


CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC <SEQ ID 
937> 

CCCGCTCGAG-TTTCAAAA1 Al Hi 1 lbbbbA sbhjy 1U y^o-> 


BamHI-Nhel 
AnOl 


ORF8 


Forward 


GCGGATCCCATATG-GCTCAACTGCTTCGTAC <SEQ ID 939> 


BamHI-Ndel 


Reverse 


CCCGCTCGAG-AGCAGGCTTTGGCGC <SEQ ID 940> 


Xhol 


ORF9 


Forward 
Reverse 


CGCGGATCCCATATG-CCbAAbbAAbl LbbAAH <or<y ±u 
941> 

CCCGCTCGAG-TTTCCGAbbi 1 1 Ibbbb <bby ID ^4Z> 


BamMl-JSIael 
Xhol 


ORF 10 


Forward 
Reverse 


G C G GAT CCC ATAT G - GAC AC AAAAG AAAT C C T C <SEQ I D 
943> 

CCCGCTCGAG- TAATGGGAAACCTTGTTTT <SEQ ID 94 4> 


BamHI-Ndel 
Xhol 


ORF11 


Forward 


GCGGATCCCATATG-GCGGTCAACCTCTACG <SEQ ID 94 5> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-GGAAACGACTTCGCC <SEQ ID 94 6> 


Xhol 


ORF 13 


Forward 


CGCGGATCCCATATG-GCTCTGCTTTCCGCGC <SEQ ID 94 7 > 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-AGGGTGTGTGATAATAAG <SEQ ID 948> 


Xhol 


ORF 15 


Forward 
Forward 


GGAATTCCATATGGCCATGG-GCGGGACACTGACAG <SEQ ID 
949> 

CGGGATCC-TGCGGGACACTGACAGG <SEQ ID 95 0> 


Ndel-Ncol 
BamHI 




Reverse 


CCCGCTCGAG-AGGTTGGCCTTGTCTATG <SEQ ID 95 1> 


Xhol 


ORF17 


Forward 
Forward 


GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG <SEQ ID 
952> 

CGGGATCC-ATTGCCGGCCTGTTCG <SEQ ID 953> 


Ndel-Ncol 
BamHI 




R everse 


CCCGCTCGAG-AAGCAGGTTGTACAGC <SEQ ID 95 4 > 


Xhol 


ORF 18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT <SEQ ID 
955> 

CCCGCTCGAG-TCTTCCAATTTCTGAAAGC <SEQ ID 95 6> 


BamHI-Ndel 
Xhol 


ORF 19 


Forward 
Forward 


GGAATTCCATATGGCCATGG -TCGCCAGTGTTTTTACC <SEQ 
ID 957> 

CGGGATCC-TTCGCCAGTGTTTTTACCG <SEQ ID 958> 


Ndel-Ncol 
BamHI 




Reverse 


CCCGCTCGAG-GGTGTTTTTGAAGCTGCC <SEQ ID 959> 


Xhol 


ORF 20 


Forward 


GGAATTCCATATGGCCATGG -TCGGCGCGGGTATG <SEQ ID 
960> 


Ndel-Ncol 
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Forward 
Reverse 


CGGGATCC-TTCGGCGCGGGTATG <SEQ ID 961> 
CCCGCTCGAG-CGGCGAGCGAGAGCA <SEQ ID 962> 


BamHI 
Xhol 


ORF 22 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGATTAAAATCAAAAAAGGTCT 

CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC <SEQ ID 
964> 

CCCGCTCGAG-ATTATGATAGCGGCCC <SEQ ID 965> 


Ndel-Ncol 

BamHI 

Xhol 




rurwdiu 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC <SEQ I D 
966> 

CCCGCTCGAG-TTTAAACCGATAGGTAAACG <SEQ ID 967 > 


BamHI-Ndel 
Xhol 


ORF 24 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG <SEQ 
ID 968> 

CGGGATCC-ATGATGCCGGAAATGGTG <SEQ ID 969> 
CCCGCTCGAG-TGTCAGCGTGGCGCA <SEQ ID 970> 


Ndel-Ncol 

BamHI 
Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC <SEQ ID 97 1> 
CCCGCTCGAG-ATCGATGGAATAGCCG <SEQ ID 972> 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG - C AG C T GAT C G AC TAT T C <SEQ ID 

^ / 

CCCGCTCGAG-GACATCGGCGCGTTTT <SEQ ID 974> 


BamHI-Ndel 
Xhol 


ORF 27 


Forward 

ruiwdiu 

Reverse 


GGAATTCCATATGGCCATGG-AGACCTATTCTGTTTA <SEQ ID 
974> 

CGGGATCC- CAGACCTATTCTGTTTATTTTAATC <SEQ ID 
975> 

CCCGCTCGAG-GGGTTCGATTAAATAACCAT <SEQ ID 97 6> 


Ndel-Ncol 
BamHI 

i-J Cl.ll IX AX 

Xhol 


ORF 28 


Forward 

JT VL W al U 

Reverse 


GGAATTCCATATGGCCATGG- ACGGCTGTACGTTGATGT <SEQ 
ID 977> 

CGGGATCC-AACGGCTGTACGTTGATG <SEQ ID 97 8> 
CCCGCTCGAG-TTTGTCAGAGGAATTCGCG <SEQ ID 97 9> 


Ndel-Ncol 
BamHI 

XJ till IX XX 

Xhol 


ORF 29 


r ui w al u. 

Forward 
Reverse 


GCGGATCCCATATG -AACGGTTTGGATGCCCG <SEQ ID 
980> 

CGCGGATCCGCTAGC- AACGGTTTGGATGCCCG <SEQ ID 
981> 

CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG <SEQ ID 98 2> 


BamHT-NdeT 

XJCX111X XX 1 > vl^- 1 

BamHI-Nhel 
Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG <SEQ ID 983> 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG <SEQ ID 984> 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG <SEQ ID 
985> 

CCCGCTCGAG-TTGATCTTTCAAACGGCC <SEQ ID 98 6> 


BamHI-Ndel 
Xhol 


ORF 35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG-TTCAGAGCTCAGCTT <SEQ ID 987> 
CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT <SEO ID 98 8 > 
CCCGCTCGAG-AAACAGCCATTTGAGCGA <SEO ID 98 9> 


BamHI-Ndel 
BamHI-Nhel 
Xhol 
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ORF37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT <SEQ ID 
990> 

rrcrrTCCAG- ATAGCCCGCTTTCAGG <SEO ID 991> 


BamHI-Ndel 


ORF 58 


Forward 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT <SEQ ID 
992> 

CCCGCTCGAG-AGCATTGTCCAAGGGGAC <SEQ ID 993> 


BamHI-Nhel 




Reverse 


Xhol 


ORF 65 


Forward 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG <SEQ 
ID 994> 


Ndel-Ncol 




Forward 


CGGGATCC-TTGCTGTATCTGAATCAAGG <SEQ ID 995> 


BamHI 




Reverse 


CCCGCTCGAG-CCGCATCGGCAGACA <SEQ ID 996> 


Xhol 


ORF 66 


Forward 


GCGGATCCCATATG-TACGCATTTACCGCCG <SEQ ID 997> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-TGGATTTTGCAGAGATGG <SEQ ID 998> 


Xhol 


ORF 72 


Forward 
Reverse 


CGCGGAT CCCATAT G - AAT GCAGT AAAAAT ATCT GA <SEQ ID 
999> 

CCCGCTCGAG-GCCTGAGACCTTTGCAA <SEQ ID 1000> 


BamHI-Ndel 
Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG <SEQ ID 
1001> 

CCCGCTCGAG-TTCATCTTTTTCATGTTCG <SEQ ID 1002> 


BamHI-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC <SEQ ID 
1003> 

CCCGCTCGAG-TTTGTTTTTGCAAGACAG <SEQ ID 1004> 


BamHI-Ndel 
Xhol 


URt 76 


Forward 
Reverse 


1005> 

rnnnATrr— TTArf^nTTT^ArAfrnTT <^fo th i nn^> 


JNnel-Nael 
Banuii 


ORF 79 


Forward 


CGCGGATCCCATATG-GTTTCCGCCGCCG <SEQ ID 1007> 


BamHI-Ndel 




Reverse 




Xhol 


ORF 83 


Forward 


GCGGATCCCATATG-AAAACCCTGCTGCTGC <SEQ ID 1009> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-GCCGCCTTTGCGGC <SEQ ID 1010> 


Xhol 


ORF 84 


Forward 


GCGGATCCCATATG- GCAGAGATCTGTTTG <SEQ ID 1011> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-GTTTGCCGATCCGACCA <SEQ ID 1012> 


Xhol 


v/ivr 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA <SEQ ID 
1013> 

CCCGCTCGAG-TCGGCGCGGCGGGC <SEQ ID 1014> 


O til 1 1 11 1 I N (J C 1 

Xhol 


ORF 89 


Forward 
Forward 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA <SEQ ID 
1015> 

CGGGATCC-GCCATACCTTCTTATCAGAG <SEQ ID 1016> 


Ndel-Ncol 
BamHI 




Reverse 


CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC <SEQ ID 
1017> 


Xhol 
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ORF 97 


Forward 
Reverse 


GCGGATCCCATATG-CATCCTGCCAGCGAAC <SEQ ID 1018> 

CCbbb I bbAb™ 1 1 bbbb 1 1 1 I L 1 ± u xux^"- 


BamHI-Ndel 


ORF98 


Forward 
Reverse 


GCGGATCCCATATG-ACGGTAACTGCGG <SEQ ID 1020> 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC <SEQ ID 102 1> 


BamHI-Ndel 
Xhol 


ORF 100 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG <SEQ ID 1022> 
CCCGCTCGAG-ACGGGTTTCGGCGGAA <SEQ ID 1023> 


BamHI-Ndel 
Xhol 


ORF 101 


Forward 
Reverse 


G C G GAT C C CAT AT G - AT T TAT C AAAG AAACC T C <SEQ ID 
1024> 

CCCGCTCGAG-rrriLLbULl 1 ILAAlbl <bti(J ID !UzO> 


BamHI-Ndel 
Xhol 


ORF 102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC <SEQ ID 1026> 
CCCGCTCGAG-AAALbb I 1 1 bAALAbbAb <bby ID lUz/> 


BamHI-Ndel 
Xhol 


ORF 103 


Forward 
Reverse 


GCGGATCCCATATG-AACCACGACATCAC <SEQ ID 1028> 
CCCGCTCGAG-CAGCCACAGGACGGC <SEQ ID 1029> 


BamHI-Ndel 
Xhol 


ORF 104 


Forward 
Reverse 


GCGGATCCCATATG-ACGTGGGGAACGC <SEQ ID 1030> 
CCCGCTCGAG-GCGGCGTTTGAACGGC <SEQ ID 1031> 


BamHI-Ndel 
Xhol 


UJvr 1 ur> 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC <SEQ ID 
1032> 

CCCGCTCGAG-TAAACGAATGCCGTCCAG <SEQ ID 1033> 


BamHT-Ndel 

J— > till _Ll J. J- 1 > VJ-V^J 

Xhol 


ORF 106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG <SEQ ID 1034 > 
CCCGCTCGAG-TTTGTTCCCGATGATGTT <SEQ ID 1035> 


BamHI-Ndel 
Xhol 


ORF 109 


Forward 


G CG G AT C C CAT AT G - G AAGAT T TAT AT AT AAT AC T C G <SEQ ID 
1036> 

CCCGCTCGAG-ATCAGCTTCGAACCGAAG <SEQ ID 1037> 


BamHI-Ndel 




Reverse 


Xhol 


ORF110 


Forward 
Reverse 


AAAGAATTOATGAGTAAATCCCGTAGATCTCCC <SEQ ID 
1038> 

AAACTGCAG-GGAAAACCACATCCGCACTCTGCC <SEQ ID 
1039> 


EcoRI 
PstI 


ORF111 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA <SEQ ID 
1040> 

AAACTGCAG-TCTGCGCGT TTTCGGGCAGGGTGG <SEQ ID 
1041> 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
<SEQ ID 1042> 

AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG <SEQ 
ID 1043> 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG <SEQ ID 
1044> 

AAAAAAGTCGAC-CTATTTTTTAGGGGCrTTTGCITGTTTGAAAAGCCTGCC 
<SEQ ID 1045> 


EcoRI 
Sail 
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ORF119 


Forward 
Reverse 


AAAGAATTC-TACAACATGTATCAGGAAAACCAATACCG <SEQ 
ID 1046> 

AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC <SEQ 
ID 1047> 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG <SEQ ID 
1048> 

AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT <SEQ ID 
1049> 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC <SEQ ID 
1050> 

AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC <SEQ 
ID 1051> 


EcoRI 
PstI 


ORF122 


Forward 
Reverse 


AAAAAAGTCGAC-ATGTC TTACCGCGCAAGCAGTTC TCC <SEQ 
ID 1052> 

AAAC T GC AG - T C AG G AAC AC AAA C GAT G AC GAAT AT C C G TAT C 
<SEQ ID 1053> 


Sail 
PstI 


ORF125 


Forward 
Reverse 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT <SEQ ID 
1054> 

AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG <SEQ ID 
1055> 


EcoRI 
PstI 


ORF126 


Forward 
Reverse 


AAAGAATTC-GCGGAAACGGTCGAAG <SEQ ID 1056> 

AAACTGCAG-TTAATCTTGTCTTCCGATATAC <SEQ ID 
1057> 


EcoRI 
PstI 


ORF127 


Forward 
Reverse 


1058> 

AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC <SEQ ID 
1059> 


EcoRI 
Sail 


ORF128 


Forward 
Reverse 


AAAGAATTC-ATGCAAGCTGTCCGCTACAGGCC <SEQ ID 
1060> 

AAACTGCAG-CTATTGCAATGCGCCGCCGCGGGAATG2TTGAGCAGGCG 
<SEQ ID 1061> 


EcoRI 
PstI 


ORF129 


Forward 
Reverse 


AAAG AAT T C - AT G G AT T T T C G T T T T G AC AT T AT T T ACG AAT AC CG 
<SEQ ID 1062> 

AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG <SEQ ID 
1063> 


EcoRI 
PstI 


ORF130 


Forward 
Reverse 


AAAGAATTC-GCAGTACTTGCCAT TCTCGGTGCG <SEQ ID 
1064> 

AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT <SEQ ID 
1065> 


EcoRI 
PstI 


ORF131 


Forward 
Reverse 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT <SEQ ID 
1066> 

CCCGCTCGAG-CCAGCGGACGCGTTC <SEQ ID 1067> 


BamHI-Ndel 
Xhol 


ORF132 


Forward 
Reverse 


GCGGATCCCATATG-AAAGAAGCGGGGTTTG <SEQ ID 1068> 
CCCGCTCGAG-CCAATCTGCCAGCCGT <SEQ ID 1069> 


BamHI-Ndel 
Xhol 


ORF133 


Forward 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG <SEQ ID 
1070> 


BamHI-Ndel 
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Reverse 


CCCGCTCGAG-AAACTTGTAGCTCATCGT <SEQ ID 107 1> 


Xhol 


ORF 134 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG <SEQ ID 
1072> 

CCCGCTCGAG-ATCCTGTGCCAATGCG <SEQ ID 1073> 


BamHI-Ndel 
Xhol 


ORF135 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAAAAGCTTT <SEQ ID 
1074> 

CCCGCTCGAG-AAATACCGCTGAGGATG <SEQ ID 1075> 


BamHI-Ndel 
Xhol 


ORF 136 


Forward 
Reverse 


CGCGGATCCGCTAGC-ATGAAGCGGCGTATAGCC <SEQ ID 
1076> 

CCCGCTCGAG-TTCCGAATATTTGGAACTTTT <SEQ ID 
1077> 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG-GGCACGGCGGGAAATA <SEQ ID 
1078> 

CCCGCTCGAG-ATAACGGTATGCCGCC <SEQ ID 107 9> 


BamHI-Ndel 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC <SEQ ID 
1080> 

CCCGCTCGAG-CGGCGTTTTATAGCGG <SEQ ID 1081> 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG <SEQ ID 
1082> 

CCCGCTCGAG-TAACGTTTCCGTGCGTTT <SEQ ID 108 3> 


BamHI-Ndel 
Xhol 


ORF 140 


Forward 
Reverse 


GCGGATCCCATATG-TTGCCCACAGGCAGC <SEQ ID 108 4> 
CCCGCTCGAG-GACGATGGCAAACAGC <SEQ ID 1085> 


BamHI-Ndel 
Xhol 


ORF 141 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT <SEQ ID 1086> 

CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT <SEQ ID 
1087> 


BamHI-Ndel 
Xhol 


ORF 142 


Forward 
Reverse 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG <SEQ ID 
1088> 

CCCGCTCGAG-AAACGTATAGCCTACCT <SEQ ID 1089> 


BamHI-Ndel 
Xhol 


ORF 143 


Forward 


GCGGATCCCATATG-GATACCGCTTTGAACCT <SEQ ID 
1090> 

CCCGCTCGAG-AATGGCTTCCGCAATATG <SEQ ID 1091> 


BamHI-Ndel 




Reverse 


Xhol 


ORF 144 


Forward 
Reverse 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC <SEQ ID 
1092> 

CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG <SEQ ID 1093> 


BamHI-Ndel 
Xhol 


ORF 147 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC <SEQ ID 
1094> 

CCCGCTCGAG-TTTGTTTTTGCAAGACAG <SEQ ID 10 95> 


BamHI-Ndel 
Xhol 



NB: 

- restriction sites are underlined 
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- for ORFs 1 10-130, where the ORF itself carries an EcoRl site (eg. ORF122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
ORFs 115 and 127), a Sail site was used in the reverse primer. 

Oligos were synthesized by a Perkin Elmer 394 DNA/RNA Synthesizer, eluted from the columns 
5 in 2ml NH 4 OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
addition of 0.3M Na- Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either 100(il or 1ml of water. OD 260 was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-lOpmol/jil. 

C) Amplification 

10 The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template 
in the presence of 20-40(jM of each oligo, 400-800|liM dNTPs solution, lx PCR buffer (including 
1.5mM MgCl 2 ), 2.5 units TaqI DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of 10|xl DMSO or 50|il 2M betaine. 

1 5 After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
at 95°C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail, 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

20 The standard cycles were as follows: 





Denaturation 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 
95°C 


30 seconds 
65-70°C 


30-60 seconds 
72°C 



The elongation time varied according to the length of the ORF to be amplified. 



CHIR-0160 (356.001) PATENT 

-61- 

The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-L5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

The amplified DNA was either loaded directly on a 1% agarose gel or first precipitated with ethanol 
5 and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA fragment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volume of the DNA 
fragment was 30^1 or 50jil of either water or lOmM Tris, pH 8.5. 

D) Digestion of PCR fragments 

10 The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: 

- NdeVXhol or NheVXhol for cloning into pET-2 lb+ and further expression of the protein 
as a C-terminus His-tag fusion 

- BamHI/XhoI or EcoRI/XhoI for cloning into pGEX-KG and further expression of the 
15 protein as N-terminus GST fusion. 

- For ORF 76, NheVBamHl for cloning into pTRC-HisA vector and further expression 
of the protein as N-terminus His-tag fusion. 

- EcoRI/Pstl EcoRI/Sall, Sall/PstI for cloning into pGex-His and further expression of 
the protein as N-terminus His-tag fusion 

20 Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40jil final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
50|il of either water or lOmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 

25 1% agarose gel electrophoresis in the presence of titrated molecular weight marker. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

10|ig plasmid was double-digested with 50 units of each restriction enzyme in 200^1 reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50}il of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD 260 of the sample, 
and adjusted to SO^g/pl of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
1 0 vector pTRC99 (Pharmacia). 

F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both 
pET22b and pGEX-KG. In a final volume of 20)11, a molar ratio of 3: 1 fragment/vector was ligated using 
0.5jal of NEB T4 DNA ligase (400 units/jxl), in the presence of the buffer supplied by the 
15 manufacturer. The reaction was incubated at room temperature for 3 hours. In some experiments, 
ligation was performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's 
instructions. 

In order to introduce the recombinant plasmid in a suitable strain, 100(il E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
20 minutes, then, after adding 800jil LB broth, again at 37°C for 20 minutes. The cells were then 
centrifuged at maximum speed in an Eppendorf microfuge and resuspended in approximately 200(il 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + 100|^g/ml 
25 ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
Miniprep Kit, following the manufacturer's instructions, to a final volume of 30jal. 5jll1 of each 
individual miniprep (approximately lg ) were digested with either NdeVXhol or BamHUXhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
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parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 

For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoRl-PstI cloning sites or, for ORFs 115 
5 & 127, EcoRl-SaR or, for ORF 122, Sall-Pstl. After cloning, the recombinant plasmids were 
introduced in the E.coli host W3 1 10. Individual clones were grown overnight at 37°C in L-broth 
with 50jil/ml ampicillin. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
10 of the recombinant protein product, ljal of each construct was used to transform 30(4,1 of E.coli 
BL21 (pGEX vector), E.coli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as described 
above. In the case of the pGEX-His vector, the same E.coli strain (W31 10) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(100ng/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (100^g/ml) in 
15 1 00ml flasks, making sure that the OD 600 ranged between 0. 1 and 0.15. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 
induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
20 0.2mlVL After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifuged in a microfuge, 
the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifuged at 6000g and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

25 A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 
diluted 1:30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-37°C) 
to OD 550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifuged at 8000rpm at 4°C. The supernatant was discarded and the 



CHIR-0160 (356.001) PATENT 

-64- 

bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifuged again. 
The supernatant was collected and mixed with 150^1 Glutatione-Sepharose 4B resin (Pharmacia) 
(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
5 centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD 280 of 0.02-0.06. The GST-fusion 
protein was eluted by addition of 700)al cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 280 was 0.1. 21^1 of each fraction 
10 were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 1 16.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 2L5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 
be added to the MW of each GST-fusion protein. 

I) His-fusion solubility analysis (ORFs 111-129) 

15 To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500|al PBS pH 7.2]. 25^.1 lysozyme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 
Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifugation step. The supernatant was collected and the pellet was resuspended 

20 in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifugation, the supernatant was collected and the pellet was resuspended 
in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] overnight 
at 4°C. The supernatants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 113, 119 and 120 were found to be soluble in PBS, whereas 
25 ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
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Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD S50 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 
the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
5 buffer A (300mM NaCl, 50mM phosphate buffer, 1 OmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 
10 2ml buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) 
and treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpm for 40 
minutes. 

Supernatants were collected and mixed with 150|il Ni 2+ -resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
15 for 30 minutes. The sample was centrifuged at 700g for 5 minutes at 4°C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 280 of 0.02-0.06. 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50mM 
20 phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D 280 of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700(il of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D 280 was 0.1. 2l\i\ of each 
25 fraction were loaded on a 12% SDS gel. 

K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20|ig/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 1 2- 
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14 hours. The protein was further dialysed against dialysis buffer II (10% glycerol, 0.5M arginine, 
50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1.55 x OD 280 ) - (0.76 x OD 260 ) 

5 L) His-fusion large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
10 column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

M) Mice immunisations 

15 20jig of each purified protein were used to immunise mice intraperitoneally. In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1, 21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol For ORFs 25 and 40, CD1 mice 
were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 

20 protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
25 37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD 620 . The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifuged for 10 minutes at lOOOOrpm. The 
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supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. 100(il bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C. The wells were then washed three times with PBT washing 
5 buffer (0.1% Tween-20 in PBS). 200jil of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 
water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
three times with PBT. 200^1 of diluted sera (Dilution buffer: 1% BSA, 0.1% Tween-20, 0.1% NaN 3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. 100^1 of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 

10 1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
37°C. Wells were washed three times with PBT buffer. 100(al of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of O-phenildiamine and lOjil of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes. 100|J,1 H 2 S0 4 was added to each well and OD 490 
was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 

15 pre-immune sera. 

O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 

20 glucose. Bacterial growth was monitored every 30 minutes by following OD 620 . The bacteria were 
let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 
buffer to reach OD 620 of 0.07. 100|il bacterial cells were added to each well of a Costar 96 well 

25 plate. 100|xl of diluted (1:200) sera (in blocking buffer) were added to each well and plates 
incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200jil/well of blocking buffer in each well. 100(0,1 of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted 1:100, was added to each well and plates 
incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 

30 washed by addition of 200fxl/well of blocking buffer. The supernatant was aspirated and cells 
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resuspended in 200jil/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 

5 P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria disrupted 
by sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed 
by centrifugation at 5000g for 10 minutes and the total cell envelope fraction recovered by 

10 centrifugation at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from 
the crude outer membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and 
incubated at room temperature for 20 minutes. The suspension was centrifuged at lOOOOg for 10 
minutes to remove aggregates, and the supernatant further ultracentrifiiged at SOOOOg for 75 
minutes to pellet the outer membranes. The outer membranes were resuspended in lOmM Tris-HCl, 

15 pH8 and the protein concentration measured by the Bio-Rad Protein assay, using BSA as a 
standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

20 R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5[xg) and total cell extracts (25jig) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 150mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1.44 % glycine, 20% methanol). The membrane was saturated by overnight incubation 
25 at 4°C in saturation buffer (10% skimmed milk, 0.1% Triton X100 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0. 1% Triton XI 00 in PBS) and incubated 
for 2 hours at 37°C with mice sera diluted 1:200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1 :2000 dilution of horseradish peroxidase labelled anti- 
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mouse Ig. The membrane was washed twice with 0.1% Triton X100 in PBS and developed with 
the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected and 
5 used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a nutator 
and let to grow until OD 620 was 0.5-0.8. The culture was aliquoted into sterile 1.5ml Eppendorf 
tubes and centrifuged for 20 minutes at maximum speed in a microfiige. The pellet was washed 
once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 620 of 0.5, diluted 
1 :20000 in Gey's buffer and stored at 25°C. 

10 50|il of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25|xl of 
diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 
incubated at 4°C. 25jil of the previously described bacterial suspension were added to each well. 
25\xl of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22\il of 

15 each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
incubated for 1 hour at 37°C with rotation and then 22^1 of each sample/well were plated on 
Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II gives a summary of the cloning, expression and prurification results. 
20 TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cloning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


+ 


+ 


His-fusion 


orf2 


+ 


+ 


+ 


GST-fusion 


orf 2.1 


+ 


n.d. 


+ 


GST-fusion 


orf 4 


+ 


+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 




GST-fusion 


orf 6 


+ 


+ 


+ 


GST-fusion 


orf 7 


+ 


+ 


+ 


GST-fusion 


orf 8 


+ 


n.d. 


n.d. 
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orf9 


+ 


+ 


+ 


GST-fusion 


orflO 


+ 


n.d. 


n.d. 




orfll 


+ 


n.d. 


n.d. 




orfl3 


+ 


n.d. 


+ 


GST-fusion 


orfl5 


+ 


+ 


+ 


GST-fusion 


orfl7 


+ 


n.d. 


n.d. 




orfl8 


+ 


n.d. 


n.d. 




orfl9 


+ 


n.d. 


n.d. 




orf20 


+ 


n.d. 


n.d. 




orf22 


+ 


+ 


+ 


GST-fusion 


orf23 


+ 


+ 


+ 


His-fusion 


orf24 


+ 


n.d. 


n.d. 




orf25 


+ 


+ 


+ 


His-fusion 


orf26 


+ 


n.d. 


n.d. 




orf27 


+ 


+ 


+ 


GST-fusion 


orf28 


+ 


+ 


+ 


GST-fusion 


orf29 


+ 


n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fusion 


orf33 


+ 


n.d. 


n.d. 




orf 35 


+ 


n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fusion 


orf 58 


+ 


n.d. 


n.d. 




orf 65 


+ 


n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fusion 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 


+ 


n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fusion 


orf 79 


+ 


+ 


n.d. 


His-fusion 


orf 83 


+ 


n.d. 


+ 


n.d. 


orf 84 


+ 


n.d. 


n.d. 




orf 85 


+ 


n.d. 


+ 


GST-fusion 


orf 89 


+ 


n.d. 


+ 


GST-fusion 


orf 97 


+ 


+ 


+ 


GST-fusion 


orf 98 


+ 


n.d. 


n.d. 




orf 100 


+ 


n.d. 


n.d. 




orf 101 


+ 


n.d. 


n.d. 




orf 102 


+ 


n.d. 


n.d. 




orf 103 


+ 


n.d. 


n.d. 




orf 104 


+ 


n.d. 


n.d. 




orf 105 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orf 109 


+ 


n.d. 


n.d. 




orfllO 


+ 


n.d. 


n.d. 
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orflll 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orfll5 


n.d. 


n.d. 


n.d. 




orf 119 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d. 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion 


orf 138 


+ 


n.d. 


+ 


GST-fusion 


orf 139 


+ 


n.d. 


n.d. 




orf 140 


+ 


n.d. 


n.d. 




orf 141 


+ 


n.d. 


n.d. 




orf 142 


+ 


n.d. 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 





Example 1 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAACAGA 
GAACCGACCG 
A. GCGGCAGC 
TAT . TACAAA 
GTATCGGCAG 
GCTGGATGTA 
GTCAGATGGT 
CAATTTGGGC 
TCGAAGCGGT 
GCCCAAAACA 
AGACCG . . . 



CAGTCAA. AT 
GTGTGGNCGG 
ACAGGGAAAT 
GGACGCGCGT 
CCGGCGGAAC 
TGCCAACGGG 
ATCGGCAGGC 
GTGATATATG 
CAGATGGTTT 
ATTTGGGCGT 



GCTTGCCGCC 
ATGACGTATC 
GCAGCAGCCC 
GCGCCGGGAT 
AGGGGTTAGC 
CGCGC . GTGC 
GGCAGCGCAG 
CCGAAGGACG 
CGGCAGGCGG 
GATGTATGCC 



GCCCTGATTG 
GGATTTTCGG 
AATACAATTT 
GATGCTGAAG 
CCAAGCCCAA 
GCCAAGATGA 
GGGGTTGTCC 
TGGAGTGCGC 
CAGCGCAGGG 
GAAAGANCGC 



CCTTGGGCTT 
GAAAACTTGC 
GGGCGCAATG 
CGGTCAGATG 
TACAATTTGG 
TACCGAAGCG 
AAGCCCAATA 
CAAGACGATG 
GGTAGCCCAA 
GCGTGCGCCA 



This corresponds to the amino acid sequence <SEQ ID 2; ORF37>: 
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1 MKQTVXMLAA ALIALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 

51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 3>: 



1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

4 51 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 

This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 



1 MKQTVKWLAA ALIALGLNRA VWA DDVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAGY* 

Further work identified the corresponding gene in strain A of iV. meningitidis <SEQ ID 5>: 



1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 



1 MKQTVKWLAA ALIALGLNQA VWA DDVSDFR ENLQAAAQGN AAAQNNLGVM 
51 YAERRGVRQD RALAQEWLGK ACQNGYQDSC DN DQRLKAGY * 



The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75aa 
overlap with ORF37a: 

10 20 30 40 50 60 

orf 37 . pep MKQTVXMLAAALIALGLNRPVWX DDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 
Mill I I I I I II I M I : II I I I I I I M I I I II I I I I I I I 111:11 : I I I : I 
orf 3 7 a MKQTVKWLAAALIALGLNQAVWA DDVSDFRENLQAAAQGNAAAQNNLGVMYAERRGVRQD 
10 20 30 40 50 60 



70 80 90 100 110 120 

or f 37 . pep DAEAVRWYRQ PAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 

11:1 : : : I 
orf 37a RALAQEWLGKACQNGYQDSCDNDQRLKAGYX 

70 80 90 

Further work identified the corresponding gene in N. gonorrhoeae <SEQ ID 7 >: 



1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 
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This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 

l MKQTVKWLAA ALIALGLNQA WJA GDVSDFR ENLQAAEQGN AAAQFNLGVM 
51 YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a 1 1 laa 
overlap with ORF37ng: 

orf37 pep MKQT VXMLAAAL I ALGLN R PVWX DDVS D FREN LXAAAQGNAAAQ YN LGAM YXQRTRVRRD 60 

Mil! I I I I I I I I I M : I! I I I I I 1 I I I I I I I I I I I I : I II : I I : t I : I 
orf37ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 60 

orf 37 . pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 120 

: : 1 I : ( I I : : M I I I I I I I I I I I : I I I I I i : I : t : I : I 
orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQDLALAQQWLGKACQNGDQNSCDNDQ 120 

15 orf 37 .pep V I Y AE GRG VRQ D D VE A VRWFRQAAAQG V AQ AQNN LG VM Y AE RXR VRQ D 168 

orf37ng RLKAGY 126 

The complete strain B sequence (ORF37-1) and ORF37ng show 51.5% identity in 198 aa overlap: 

10 20 30 40 50 60 

20 o r f 3 7 - 1 . pep MKQTVKWLAAAL I ALG LNRAVWAD DVSD FRENLQAAAQGNAAAQYNLGAMYYKGRGVRRD 

I I I I I I 1 I I I I I I M I I I ■' I I I I I It I I I I I I I I I I II I II I : I I I : I I : I : I I I : I 
orf37ng MKQT VKW LAAAL I ALGLN QAVWAG DV S D FREN LQAAEQGNAAAQ FNLGVM YEN GQGVRQD 

10 20 30 40 50 60 



10 



25 70 80 90 100 110 120 

orf 37-1 . pep DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGWQAQYNLG 
: : I I : M I : I : I I I I I I II I I I II : I I I I I I I 

orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 

70 80 90 

30 

130 140 150 160 170 180 

orf 37-1. pep V I Y AEGRG VRQD D VE A VRW FRQAAAQG V AQ AQN N LG VM Y AE RRG VRQ D RALAQE W FGKAC 

I I I I : I : I I I I 

orf37ng L ALAQQW L GKAC 

35 100 



190 199 
orf 37-1 .pep QNGDQDGCDNDQRLKAGYX 
I I I I I : : I I I I I I II I I I I 
40 orf37ng QN G DQN S C DN DQRLKAG YX 

110 120 

Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



45 ORF37-1 (llkDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
1 A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 1C), and a 
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bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1. 
Example 2 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 9>: 

TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 
GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 
TCAGG CGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 
ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 
10 GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 

CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 
TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
GCCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 10>: 

15 1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 

51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 
101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a hypothetical Kinfluenzae protein (ybrdhaein; accession number p45029) 
20 SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

yrbd.h LGIGALVFLGLRVANVQGFAETKSYTVTATFDNIGGLKVRAPLKIGGVVIGRVSAITLDE 

I : : I I II i i : I I : I : t I : : I I I : I I : I I 
N . m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 
25 10 20 30 

80 90 100 110 120 130 

yrbd.h KSYLPKVSIAINQEYNEIPENSSLSIKTSGLLGEQYIALTMGFDDGDTAMLKNGSQIQDT 
j | | : : | : : : : : : I : : : : : I I I I II I I I I I I : I I I I I : I : I : I I 

30 N.m KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDTISVT 

40 50 60 70 80 

140 150 160 

yrbd . h TSAMVLEDLIGQFL--YGSKKSDGNEKSESTEQ 
35 : | | | | | | : 1 I I : I : :::(:: I I : : : : : : I : 

N.m SSAMVLENLIGKFMTSFAEKNADGGNAEKAAEX 
90 100 110 120 

Homology with a predicted ORF from N. gonorrhoeae 
40 SEQ ID 9 shows 99.2% identity over a 1 1 8aa overlap with a predicted ORF from N. gonorrhoeae: 

20 30 40 50 60 70 

yrbd GAAAVAFLAFRVAGGAAFGGSDKTYAVYADFGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

I I I I I I I I I I I I II I I I I I I I I I M I I I I I 
N . m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 
45 10 20 30 

80 90 100 110 120 130 

yrbd KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
I I i I I I II I 1 I I II I I I I I I I I I I II i I I 1 I i I I II I I I I I M II I I I I I ( f ! I I M I 1 I 
50 N.m KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 

40 50 60 70 80 90 
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10 



yrbd 
N .m 



140 150 160 

VLENLIGKFMTS FAEKNAEGGNAEKAAEX 
M M I I I I I I I I M 1 1 M : I M 1 1 I I M I 
VLENLIGKFMTSFAEKNADGGNAEKAAEX 
100 110 120 



The complete yrbd Kinfluenzae sequence has a leader sequence and it is expected that the full- 
length homologous N. meningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
epitopes, could be a useful antigen for vaccines or diagnostics. 



Example 3 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1 1>: 



15 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



. ATTTTGATAT 
TCAGGAACGC 
GTTCCATGCG 
GAACGCCTGA 
ACTGCCTGAA 
CCCGCCCGCT 
CGCCGCCACG 
GCGCAACGCg 
TCGACCACTT 
AAAAAAGTAT 
GCCCCCTTTC 
ACGGAAAAGT 
ATCGTTTTTC 
CATCGGCACG 
ACGTCGCCGT 
AAAGCCGCCG 
GACCGTCTCG 
AAGCGGTCG . . 



ACCTCATCCG 
CCCGGAAAGG 
CGACGGCTTG 
CACCGTTCGG 
TTATGGAATA 
GCTGATGCAA 
AAATGAAACC 
CTTTCGTGGG 
CAGCCTGTGC 
TAATCAAGGA 
ACAGGAAAAC 
CGTTGCCGAC 
TGGACGACCG 
ACGCTGCTGC 
CGCCGTCGGC 
CGCTCGGCTT 
CCTTCTGCAA 



CAAGAATCTA 
ACGGAAAACC 
TATTCAGACG 
CAAAAAACTG 
TCTTAAAAGG 
TATCTGCCGC 
CGGCATTACC 
ACGAAAAATT 
CTCGACATCA 
AGGGATTTCC 
GCAAACTCGC 
CTTGCCGCCG 
CGCACAAGGC 
TTGAAAACAG 
AACAACCGCA 
CGCCCTGCCC 
CAGTCGGACA 



GGTTCGCCCG 
TTTTAAAATG 
GCATTCCGCT 
CGTGCCGcCA 
CGAGATGAGC 
TGTACGACAA 
GGCTGGGCGC 
CGCCTGCGAT 
AAATCCTACT 
GCACAGGGCG 
CGTCGTCGGT 
CACTCGGCCG 
AGCGTCAACG 
TTTATCGCCC 
TCCGCCGCCA 
GTACTGGTTC 
AGGCAGCGTC 



TCTTCTTCTT 
GTCAAATTCC 
GCCCGACGGA 
GTwTGGACGA 
CTGGTCGGCC 
CTTCCAAAAC 
AGGTCAACGG 
GTTTGGTATA 
GCTGACGGTT 
AACA.aCCAT 
GCGGGCGGAC 
GTACAGGGAA 
GCTTTTCCGT 
GAACAATACG 
AATCGCCGAA 
ATCCGGACGC 
GTTATGGCGA 



This corresponds to the amino acid sequence <SEQ ID 12; ORF3>: 



35 



l 

51 
101 
151 
201 
251 



. ILIYLI RKNL 
ERLTPFGKKL 
RRHEMKPGIT 
KKVLIKEGIS 
IVFLDDRAQG 
KAAALGFALP 



GSPVFFFQER 
RAASXDELPE 
GWAQVNGRNA 
AQGEXTMPPF 
SVNGFSVIGT 
VLVHPDATVS 



PGKDGKPFKM 
LWNILKGEMS 
LSWDEKFACD 
TGKRKLAWG 
TLLLENSLSP 
PSATVGQGSV 



VKFRSMRDGL 
LVGPRPLLMQ 
VWYIDHFSLC 
AGGHGKWAD 
EQYDVAVAVG 
VMAKAV , 



YSDGIPLPDG 
YLPLYDNFQN 
LDIKILLLTV 
LAAALGRYRE 
NNRIRRQIAE 



Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGAGTAAAT 
ACTGATTTTC 
AGAATCTAGG 
GGAAAACCTT 
TTCAGACGGC 
AAAAACTGCG 
TTAAAAGGCG 
TCTGCCGCTG 
GCATTACCGG 
GAAAAATTCG 
CGACATCAAA 
GGATTTCCGC 
AAACTCGCCG 
TGCCGCCGCA 
CACAAGGCAG 
GAAAACAGTT 
CAACCGCATC 
CCCTGCCCGT 



TCTTCAAACG 
CTCTCGCCAG 
TTCGCCCGTC 
TTAAAATGGT 
ATTCCGCTGC 
TGCCGCCAGT 
AGATGAGCCT 
TACGACAACT 
CTGGGCGCAG 
CCTGCGATGT 
ATCCTACTGC 
ACAGGGCGAA 
TCGTCGGTGC 
CTCGGCCGGT 
CGTCAACGGC 
TATCGCCCGA 
CGCCGCCAAA 
TCTGGTTCAT 



CCTGTTTGAC 
TATTTTTGAT 
TTCTTCTTTC 
CAAATTCCGT 
CCGACGGAGA 
TTGGACGAAC 
GGTCGGCCCC 
TCCAAAACCG 
GTCAACGGGC 
TTGGTATATC 
TGACGGTTAA 
GCCACCATGC 
GGGCGGACAC 
ACAGGGAAAT 
TTTTCCGTCA 
ACAATACGAC 
TCGCCGAAAA 
CCGGACGCGA 



ATTGTTGCCT 
TTTGATATAC 
AGGAACGCCC 
TCCATGCGCG 
ACGCCTGACA 
TGCCTGAATT 
CGCCCGCTGC 
CCGCCACGAA 
GCAACGCGCT 
GACCACTTCA 
AAAAGTATTA 
CCCCTTTCAC 
GGAAAAGTCG 
CGTTTTTCTG 
TCGGCACGAC 
GTCGCCGTCG 
AGCCGCCGCG 
CCGTCTCGCC 



CCGCCTCGGG 
CTCATCCGCA 
CGGAAAGGAC 
ACGCGCTTGA 
CCGTTCGGCA 
ATGGAATATC 
TGATGCAATA 
ATGAAACCCG 
TTCGTGGGAC 
GCCTGTGCCT 
ATCAAGGAAG 
AGGAAAACGC 
TTGCCGACCT 
GACGACCGCG 
GCTGCTGCTT 
CCGTCGGCAA 
CTCGGCTTCG 
TTCTGCAACA 
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901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 
951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 
1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 
1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 
1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 
1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 
1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

This corresponds to the amino acid sequence <SEQ ID 14; ORF3-l>: 

1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 

101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFACDVWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

251 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 

301 VGQGSVVMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 

351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAWVRDVS DGMTVAGNPA 

4 01 KPLPRKNPET STA* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF3 shows 93.0% identity over a 286aa overlap with an ORF (ORF3a) from strain A of N. 
meningitidis: 

10 20 30 

orf3 pep ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 

I M I I I I I II I ) II I I I I I I I I I ! I I I 1 I I 1 I I I 
orf3a MSKFFKRLFDIVASA SGLIFLSPVFLILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 
10 20 30 40 50 60 

40 50 60 70 80 90 

or f 3. pep SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I : I : 1 MM M I II M I I ! I M M M ! I I M M I I : I I I : I M I I M I I 1 I I I I M 
orf3a SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

100 110 120 130 140 150 

orf 3 . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 
I I I I I 1 I I I I I I I I I I I I I 11 I I I I I I I I 1 I : I I 1 I : II I I M I I II II M I I I 1 I I I I I 
orf 3a YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFS LCLDIKILLLTVKKVL 
130 140 150 160 170 180 

160 170 180 190 200 210 

orf 3 . pep IKEGISAQGEXTMPPFTGKRKLAVVGAGGHGKVVADLAAALGRYREIVFLDDRAQGSVNG 
I I M I M II 1 M M M M M I M II I M I I I t M : I I M II I M M I I M : I M II I 
orf 3a IKEGISAQGEATMPPFTGKRKLAVVGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
190 200 210 220 230 240 

220 230 240 250 260 270 

orf 3 . pep FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
I M I I M II M I II M I : I : i M I I I II I M M I I I M M II II M I : I II : II I M M 
orf 3a FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
250 260 270 280 290 300 

280 

orf 3 . pep VGQGSVVMAKAV 
I I I I : I I I I I I I 

orf 3a VGQGGVVMAKAVVQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
310 320 330 340 350 360 



The complete length ORF3a nucleotide sequence <SEQ ID 15> is: 



1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 
51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 
101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 
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151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

201 TTCAGACGGC ATTCTGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

5 351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

10 601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

701 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

7 51 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

15 851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 

20 1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAACCATTGG CAGGCAAAAA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 16>: 

1 MSKFFKRLFD IVASA SGLIF LSPVFLILIY LI RKNLGSFV FFFQERPGKD 

25 51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKVVAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD IAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 

30 301 VGQGGVVMAK AVVQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 

351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAVVVRDVS DGMTVAGNPA 

401 KPLAGKNTET LRS* 

Two transmembrane domains are underlined. 



ORF3-1 shows 94.6% identity in 410 aa overlap with ORF3a: 



35 



40 



45 



50 



10 20 30 40 50 60 

orf 3a . pep MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
I i I I I I I I I I ! I I I I t I ! M I I I I I I I I I I I I I i If I I I I I M I I I I i I I I I M I i I M I 
orf 3-1 MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 3a . pep SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
11:11111111 1 I 1 I ! 1 1 I 1 I 1 I I I I I I I ! I I M I II I : I I t : I I ! I I i I I I M I I I I I 
orf 3-1 SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 3a . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 
t 1 II I 1 1 M 1 I M 1 I I 1 I I I I I I I I I 1 I I II : M I I : I I ! M IS I i II I II I I I i I I I I I 
orf 3-1 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 

130 140 150 160 170 180 



55 



190 200 210 220 230 240 

orf 3a . pep IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
1 I I I II I II I I I I I I I 1 I I I I I I i I M I I I I I I I I : M I I I I I I I I II 11 I : I M I I I 
orf 3-1 I KEG I SAQGE ATMPPFTGKRKLAVVGAGGHGKWADLAAALGRYRE I VFLDDRAQG S VNG 

190 200 210 220 230 240 



60 



250 260 270 280 290 300 

orf 3a . pep FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
I i 1 I 1 II I I 1 I I I I I I I : 1 : I I 1 I I I I I I II I I I I 1 I I I I 1 I I I I M : I I I : I 1 I I I 1 I 
orf 3-1 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFAL PVLVHPDATVSPSAT 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf3a pep VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
| | M : | j I M M II I I I I I I I II I I I M II II I I I I : I I II I I I 1 I I 1 I M I i I 1 I 1 I I 
5 orf 3-1 VGQGSWMAKAVVQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 

370 380 390 400 410 

orf3a pep I G T G AC SRQQIRIGS RAT I GAG AV WR D V S DGMT V AGN P AK P LAGKN T E T LRS X 

10 I I I I I I M II I I I M I M I I I I I I I I i M I I I I I I I i I I I I I I MM 

orf 3-1 IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 

370 380 390 400 410 

Homology with hypothetical protein encoded by yvfc gene (accession Z71928) of B. subtilis 
1 5 ORF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

IYLIRKNLGSPVFFFQERPGKDGKPFKMVKFRSMRDGLYSDGIPLPDGERLTPFGKKLRA 62 
I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 
IAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 86 



20 



ORF3 


3 


yvfc 


27 


ORF3 


63 


yvfc 


87 


ORF3 


123 


yvfc 


147 



S DELP+L N+LKG++SLVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 



25 W++KF DVWY+D++S LD EGI T FTG 

yvfc 147 WEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEGIQQTNHVTAERFTG 196 

Homology with a predicted ORF from N. gonorrhoeae 

ORF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORF3.ng) from N. 
30 gonorrhoeae: 

orf3 ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 34 

: I M I II M I M M I :: I I I I 1 I I I I I I M 1 I I 
orf3ng MSKAVKRLFDIIASA SGLIVLSPVFLVLIYLI RKNKGSPVFFIRERPGKDGKPFKMVKFR 60 

35 orf3 SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 94 

M M : I II I II II M M I I I II M M : I II M M M : II M I II I I M I M I II I II 
orf 3ng SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

orf3 Y DN FQN RRHEMK P G I T G W AQ VN GRN AL S W DE K F AC D VW YIDHFSLCLDIKI LLLT VKKVL 154 

40 I :: M ! I M I !!! I ! 1 I I I I I M I ! I I I I II II I M I I I : I I : I I : II I : II II II I 

orf3ng YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 180 

orf 3 IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKVVADLAAALGRYREIVFLDDRAQGSVNG 214 

I II I II M I I 11 I II : I : ! I II I : I I I M I II I I : M I M I 1 M I I II M : M I I II 
45 orf3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKVVAELAAALGTYGEIVFLDDRTQGSVNG 240 



50 



orf 3 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 

I I I I I I I I I M I I I I I I : I : : I M I I I I M I I I i : I I I I M M I I : II i M M I M 

orf3ng FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 300 

orf3 VGQGSVVMAKAV 286 

: I I I I I I I I I I I 

orf3ng IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 



The complete length ORF3ng nucleotide sequence <SEQ ID 17> is: 

55 1 ATGAGTAAAG CCGTCAAACG CCTGTTCGAC ATCATCGCAT CCGCATCGGG 

51 GCTGATTGTC CTGTCGCCCG TGTTTTTGGT TTTAATATAC CTCATCCGCA 

101 AAAACTTAGG TTCGCCCGTC TTCTTCattC GGGAACGCCc cgGAAAGGAc 

151 ggaaaacCTT TTAAAATGGT CAAATTCCGT TCCAtgcgcg acgcgcttGA 

201 TTCAGACGGC ATTCCGCTGC CCGATAGCGA ACGCCTGACC GATTTCGGCA 

60 251 AAAAATTACG CGCCACCAGT TTGGACGAAC TTCCTGAATT ATGGAATGTC 

301 CTCAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTTT TGATGCAGTA 
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351 TCTGCCGCTT TACAACAAAT TTCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAGTTCT CCTGCGATGT TTGGTACACC GACAATTTCA GCTTTTGGCT 

501 GGATATGAAA ATCCTGTTTC TGACAGTCAA AAAAGTCTTG ATTAAAGAAG 

551 GCATTTCGGC GCAAGGGGAA GCCACCATGC CCCCTTTCGC GGGGAATCGC 

601 AAACTCGCCG TTATCGGCGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCA 

701 CCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCACCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCACCGAAAA CGCCGCCGCG CTCGGCTTCA 

851 AACTGCCCGT TCTGATTCAT CCCGACGCGA CCGTCTCGCC TTCTGCAATA 

901 ATCGGACAAG GCAGCGTCGT AATGGCGAAA GCCGTCGTAC AGGCCGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TGACGCTTTC GtccaCATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCCGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ACAACCGTCG GCAGCGGGGT TACCgccgGT GCAGGGgcGG 

1151 TTATCGTATG CGACATCCCG GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAGCCCCTTA CGGGCAAAAA CCCCAAGACC GGGACGGCAT AA 

This encodes a protein having amino acid sequence <SEQ ID 18>: 



1 MSKAVKRLFD 

51 GKPFKMVKFR 

101 LKGEMSLVGP 

151 EKFSCDVWYT 

201 KLAVIGAGGH 

251 ENSLSPEQFD 

301 IGQGSWMAK 

351 GNTRIGEESR 

4 01 KPLTGKNPKT 



IIASASGLIV LSPVFLVLIY 



SMRDALDSDG 
RPLLMQYLPL 
DNFSFWLDMK 
GKWAELAAA 
ITVAVGNNRI 
AVVQAGSVLK 
IGTGACSRQQ 
GTA* 



IPLPDSERLT 
YNKFQNRRHE 
ILFLTVKKVL 
LGTYGEIVFL 
RRQITENAAA 
DGVIVNTAAT 
TTVGSGVTAG 



LIRKNLGSPV 
DFGKKLRATS 
MKPGITGWAQ 
IKEGISAQGE 
DDRTQGSVNG 
LGFKLPVLIH 
VDHDCLLDAF 
AGAVIVCDIP 



FFIRERPGKD 
LDELPELWNV 
VNGRNALSWD 
ATMPPFAGNR 
FPVIGTTLLL 
PDATVSPSAI 
VHISPGAHLS 
DGMTVAGNPA 



This protein shows 86.9% identity in 413 aa overlap with ORF3-1: 



10 20 30 40 50 60 

orf 3-1. pep MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
III 111111:1111111 I I I 1 I I : 1 1 I N I I I I I I I I I I :: i I I I I I I I I I I I I I M 
orf3ng MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 3-1 . pep SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I I 1 I ! I I I 1 I I 1 I I : I I 1 I I I I I I II : I I I I 1 I I 1 1 I : 1 I I 1 I I I I 1 I I I ! II I I I I 1 
orf3ng SMRDALDSDG I PLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 3-1 . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 
I ::! I I I I I I I I 1 M I I I I I I I I II I I I I I I I I : I I I M I : I I : I 1 : I M : I M 1 I I 1 
orf3ng YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 3-1 . pep IKEGISAQGEATMPPFTGKRKLAVVGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
j I I I I I I I II I I M I I : I : I 1 I 1 I : I I I I I I I I I I : I I I I I 1 I I I I I I I I ! : II 1 I 1 I 
orf3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKVVAELAAALGTYGEIVFLDDRTQGSVNG 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 3-1 . pep FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

I I 1 I 1 I I I I I I I I I 1 1 1 : i :: I I I I 1 I II I I I I : I : I I I I I I I I I 1 : I 1 I I I I I I 1 I 
orf3ng FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 3-1 . pep VGQGSWMAKAVVQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

: I I 1 II H I 1 I 1 I I I I I I I I I I I 1) I 1 I I I I I I I I I I : I I I 1 I I I M I I I I 1 I : I I I I I 
orf3ng IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 

310 320 330 340 350 360 



370 380 390 400 410 
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Query: 


5 


Sbjct: 


3 


Query: 


65 


Sbjct: 


63 


Query : 


125 


Sbjct: 


123 


Query: 


185 


Sbjct: 


183 



nrf3-l ueo IGTGACSRQQIRIGSRATIGAGAWVRDVSDGHTVAGNPAKPLPRKNPETSTAX 
I III) II Ml :|| :| Mil 1:1 I: MINIUM III IMMMM 
orf3nq iGTGACSRQQTTVGSGVTAGAGAVIVCDIPDGMTVAGNPAKPLTGKNPKTGTAX 

370 380 390 400 410 

In addition, ORF3ng shows significant homology with a hypothetical protein from B.subtilis: 

gnl|PID|e238668 (271928) hypothetical protein [Bacillus subtilis] 
>gi|1945702(gnl|PID|e313004 (Z94043) hypothetical protein [Bacillus subtilis] 
>gi|2635938|gnl|PID|ell86113 (Z99121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis ] Length = 202 

Score = 235 bits (594), Expect = 3e-61 

Identities - 114/195 (58%), Positives - 142/195 (72%) 

VKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFRSMRD 64 
+KRLFD+ A+ L S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 
LKRLFDLTAAIFLLCCTSVIILFTIAVVRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTD 62 

ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 124 
DS G LPD RLT G+ +R S+DELP+L N VLKG ++SLVGPRPLLM YLPLY + 



Q RRHE+KPGITGWAQ+NGRNA+SW++KF DVWY DN+SF+LD+KIL LTV+KVL+ EG 



T F G+ 
2TNHVTAERFTGS 197 

The hypothetical product of yvfc gene shows similarity to EXOY of R.meliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N .gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 4 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 19>: 

1 . . AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT . GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

4 01 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ID 20; ORF5>: 

1 . . NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKSPYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS XR RFCTV* 

Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA C AG CATC GAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 
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301 


AAAGACGAAG 


351 


GTTTAACCCC 


401 


TCGTCCCCGA 


451 


CAGCGCAACC 


501 


CTTGGTCACC 


551 


ACGAGTTTGA 


601 


GAACGCTGGC 


651 


CTTCGGCACG 


701 


ATTCAAGAGT 


751 


CGGTTTGCAG 


801 


TGATGGCGAC 


851 


TGACGGTACG 



10 

This corresponds to amino acid sequence <SEQ ID 22; ORF5-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

15 51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

201 ERWRIHAATE IEDINTFFGT EYSSEEADTI RPGHSRVGTS ARARRKSPYR 

251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

20 Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 23 >: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

25 201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 

4 01 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

30 4 51 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 

651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

35 701 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCGAAA AAGTCNTTAT 

7 51 CGGCGNOTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC CGCCGTTTCT GTACAGTTTA 

851 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 

901 TAA 

40 This encodes a protein having amino acid sequence <SEQ ID 24; ORF5a>: 

1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 
45 201 ERWRIHAATE IEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 

251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

The originally-identified partial strain B sequence (ORF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 

50 10 20 30 

orf5.pep NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

I I I I I ! M I II 1 I I 1 I I I i I I I t II I 1 I : I 
orfSa FHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 
130 140 150 160 170 180 

55 

40 50 60 70 80 90 

orfS.pep EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 
I I I I I I I : I I I I M II I : : I I I I II I I I I I I I : I I I I I I I I I 1 I I I I I I : I I ] 
orf5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 
60 190 200 210 220 230 240 
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100 110 120 130 

orf5 pep RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 

1 I I I I I Ml I I I : I I M I I I I I I I M 1 I I 
orf 5a RARRKSXYRRXAXHXRXRXQP P PAYADGDPRE VS S AVS VQFRMT VRAFS VS IRPIRXTX 

250 260 270 280 290 300 

The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 

10 20 30 40 50 60 

orf 5a . pep MDGAQPKTNFXXRLIARLAREPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 
| 1 | | | | M I I I I I I i I I I I I I I i I I i I : li I I 1 I I I I I I I I I I M I i II I I II I I I I I 
orf 5-1 MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 5a pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
I M I I I I I I I I I I I I I I M I I I I I I I I I I I I 1 I I M I I I I I I I I M I i I I II I I I I I I I I 
orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 5a . pep EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
I II I I M I I I I I I I I II II II II II I I I I! i M I II i I I I I i I ! I II i I I I I I I II I I I I 
orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 5a . pep DIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGT 
: I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I : I M I M I I I I I I I I ! I I : I 1 
orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIfiAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 
190 200 210 220 230 

250 260 270 280 290 300 

orf 5a . pep PARARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXT 
I I I I I 1 I III I I 1:1 I I I I I I I I I I I I I I I : I I I : I I I I I II I I I I I I I 1 I 1 I 
orf 5-1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVSIRPIRQT 
240 250 260 270 280 290 

Further work identified the a partial DNA sequence in K gonorrhoeae <SEQ ID 25> which encodes 
a protein having amino acid sequence <SEQ ID 26; ORF5ng>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 

Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ID 27> to be: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

201 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

4 01 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

4 51 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTCACC TTTGAAGACA T CATC GAG CA AATCGTCGGT GACATCGAAG 

551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 

601 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 

651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 

701 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 

751 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 
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851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 
901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 



1 MDGAQPKTNF 

51 KVLDFAELEV 

101 KDEVLGILHA 

151 QRNHMAIVID 

201 ERWRIHAATE 

251 RRFAVHRRPR 

301 IRQT* 



FERLIARLAR 
RDAMITRSRM 
KDLLKYMFNP 
EYGGTSGLVT 
IEDINAFFGT 
RQPPPAHADG 



EPDSAEDVLN 
NVLKENDSIE 
EQFHLKSVLR 
FEDIIEQIVG 
EYGSEEADTI 
DPREVSRACP 



LLRQAHEQEV 
RITAYVIDTA 
PAVFVPEGKS 
DIEDEFDEDE 
RRLGHSGIGT 
TAVSAQFRMT 



FDADTLTRLE 
HSRFPVIGED 
LTALLKEFRE 
SADDIHSVSA 
PARARRKSPY 
VRSFSVSIRP 



The originally-identified partial strain B sequence (ORF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORF5ng): 



orf5 
orf 5ng 
orf5 
orf 5ng 
orf5 
orf 5ng 



NHMAI VIDE YGGT S GLVT FEDIIEQIVGEI 30 
II I M I I M I II ! I I I I I ! I 1 I M I II I : 1 
FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 182 

EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 

I | | | M | : I I I : M ; I I : : I I I I I I I I I I I I I : I M I I I : I I II I I I II! : I I I 
EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 

RARRKS P YRRFAVHRRTRRQ P P PA YADG D PRE VSX RRFCTV 131 

I M I I II M I II I I M 1111111:1111111)1 MINI 
RARRKS PYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 287 



90 



242 



The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) show 92.4% identity in 
304 aa overlap: 

10 20 30 40 50 60 

orf5ng-l .pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLEKVLDFAELEV 
I I I I I I I I M I II I I I I I I I I I I 1 I I I I I I I I I I I I II I ! I I I I I I I I I I I I I I :: I I I 
orf 5-1 MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf5ng-l .pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
I I I I I I I I I I II I I I I I II I I I I I I II I I I M M I 1 M I I I I I II 1 M I I I II I I I II II 
orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf5ng-l .pep EQFHLKSVLRPAVFVPEGKSLTALLKE FREQRNHMAI VI DEYGGTSGLVT FEDIIEQIVG 
I I I I I I I : I M I I I I I I I 11 M M M I I I I M I I I I M I I I I I I 1 I M I I I I M I I I M I 
orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVT FEDIIEQIVG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 5ng-l . pep DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGT 
: M I I I I I I : M I : II : I I : I I I i I I M I I I M M : I I I I I I : ( II I I II I I I I : I I 
orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 

250 260 270 280 290 300 

orf 5ng-l . pep PARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRSFSVSIRP 
II I M I I I M I I I I M I I I f I I I I : M I I I I I I I I II I M f II I I I : I I I I II I 

orf 5-1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMT VRAFSVS I RP 

240 250 260 270 280 290 



60 



orf5ng-l.pep IRQTX 
I I M I 

orf5-l IRQTX 
300 
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Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
identified the following homologies: 

Homology with hemolysin homolog TlyC (accession U32716) of Kinfluenzae 
ORF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 

5 ORF5 2 HMAIVIDEYGGTSGLVTFEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 61 

HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 
TlyC 166 HMAIVVDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDD 224 

ORF5 62 INTFFGTEYSIEEADTI 7 8 
10 N F T++ EE DTI 

TlyC 225 FNAQFNTDFDDEEVDTI 241 

ORF5ng-l also shows significant homology with TlyC: 



15 



45 



55 



SCORES Initl: 301 Initn: 419 Opt: 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 



10 20 30 40 50 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQAHEQEVFDADTLTRLEK 

| ||: | : : | : : I : | :::::: | :::::::: I : I :| 
tlyc haein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 
20 ~ 10 20 30 40 50 60 

60 70 80 90 100 109 

orfSng-l.pep VL D FAE LE VRD AM I TR S RMN VLKEN D S I E R I TAYV I DT AH S R F PV I GE — DKDEVLGILH 
1 : : : I I I : ! I I II 1 I : : :::::::: : i : : I I I M I I I : : I : I : : : I I II 

25 tlyc haein VMEIAELRVRDIMIPRSQIIFIEDQQDLNTCLNTIIESAHSRFPVIADADDRDNIVGILH 

70 80 90 100 110 120 

110 120 130 140 150 160 

orf 5ng-l . pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 
30 IMIII:: : I I I : I : I I I : I : I I I : I : : I I : H : I I I II I : I I : I : : I I I 

tlyc haein AKDLLKFLREDAEVFDLSSLLRPVVIVPESKRVDRMLKDFRSERFHMAIVVDEFGAVSGL 
130 140 150 160 170 180 

170 180 190 200 210 220 

35 orfSng-l.pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 

I I : I I I : M I I I M I M II I : I II I : : : I : : : : I I : I : I : I I I : I : : : M : I 
tlyc_haein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 
190 200 210 220 230 

40 230 240 250 260 270 280 

orf 5ng-l . pep TIRRLGHSGIG-TPARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 
I I I : : I I I : 

tlyc haein TIGGLIMQT FGYLPKRGEEIILKNLQFKVTSADSRRLIQLRVTVPDEHLAEMNNVDEKSE 
240 250 260 270 280 290 



Homology with a hypothetical secreted protein from E.coli: 

ORF5a shows homology to a hypothetical secreted protein from E.coli: 



Sp I P7 7392 | YBEX_ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>gi 1 1778577 (U82598) similar to H. influenzae [Escherichia coli] >gi 1 1786879 
50 (AE000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 

approx. 440 aa protein YTFL_HAEIN SW: P44717 [Escherichia coli] Length = 292 



Score = 212 bits (533), Expect = 3e-54 

Identities = 112/230 (48%), Positives = 149/230 (64%), Gaps « 3/230 (1%) 

Query: 2 DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 60 

D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 
Sbjct : 10 DTISNKKGFFSLLLSQLFHGEPKNRDELLALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 



60 Query: 61 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYM-FN 119 
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RD MI RS+M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 

SbjCt: 70 RDIMIPRSQMITLKRNQTLDECLDVIIESAHSRFPVISEDKDHIEGILMAKDLLPFMRSD 129 

Query 120 PEQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIV 179 

E F + +LR AV VPE K + +LKEFR QR HMAIVIDE+GG SGLVT EDI+E IV 
Sbjct: 130 AEAFSMDKVLRQAWVPESKRVDRMLKEFRSQRYHMAIVIDEFGGVSGLVTIEDILELIV 189 

Query: 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A IED N FGT +S EE DT 
Sbjct: 190 GE IEDEYDEEDDI D- FRQLSRHTWTVRALAS IEDFNEAFGTHFS DEEVDT 238 

Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
H. influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
K meningitidis and K gonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 

ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in Ecoli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 2 A shows 
the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein was used 
to immunise mice, whose sera were used for Western blot analysis (Figure IB). These experiments 
confirm that ORF5-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 5 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 29>: 

1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 

51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 

101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

251 GCGATGCAAC GCCGCCTGAA TGAgGGCATG GGAAAGCAGG CAGGACGGGC 

301 TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 

4 01 CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

4 51 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGACCTGCGC 

501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

551 GATTGCGCTG CCC. 

This corresponds to the amino acid sequence <SEQ ID 30; ORF7>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP.. 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 31>: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

2 01 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 
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501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

7 01 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

7 51 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA AAATAA 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 



1 MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLV EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded by ycez gene (accession P44270) of Kinfluenzae 
ORF7 and yceg proteins show 44% aa identity in 192 aa overlap: 



ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+ G+ V+ IEG F RK ++ P + K SNE++ A ++ + 

yceg 102 LNSGKEVQFNVKWIEGKTFKDWRKDLENAPHLVQTLKDKSNEEIFALLDLPDIGQNLELK 161 

ORF7 56 NPEGQFFPDS YEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLV 115 

N EG +PD+Y +DL++ + + + M++ LN+AW R + LP NPYEMLI+A +V 

yceg 162 NVEGWLYPDTYNYTPKSTDLELLKRSAERMKKALNKAWNERDEDLPLANPYEMLILASIV 221 

ORF7 116 EKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYT 17 5 

EKETG VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNTY 

yceg 222 EKETGIANERAKVASVFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 

ORF7 17 6 RGGLPPTPIALP 187 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 

The complete length YCEG protein has sequence: 



1 MKKFLIAILL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF7 shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) from strain A of K 
meningitidis: 



10 20 30 

orf 7 .pep MRGGRPDSVTVQIIEGSRFSHMRKVIDATP 

I I I M I I I I I II I I I I I M I I M I M I I I I 
orf 7a AAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDATP 
70 80 90 100 110 120 



orf 7 .pep 



40 50 60 70 80 90 

DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 
II < I I I I I M I I II I I II I I I I I I I t I I I I I t I I I I t I I M I I I : | | | | | | | | | | | ( | 
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DIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAMQRRLN 
130 140 150 160 170 180 

100 110 120 130 140 150 

EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIY 

| M | | M | | I | | ] 1 II I I 1 I I I I : I I I i I M I i I I U I I I I I I I I I 1 t I I I 1 I t I 
EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 

190 200 210 220 230 240 

160 170 180 

GMG AAYKGK I RKAD LRRDT P YNT YTRGG L P PT P I AL P 
1 I M I I I M I I I I I I I I I I 1 I I I I M I I I I I I 1 I M I 

GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVSKM 
250 260 270 280 290 300 

DGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 

The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

101 ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 

4 51 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

7 01 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

7 51 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 34>: 

1 MLRKLLKWSA VFLTVSAAVF AA LLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

A leader peptide is underlined. 



ORF7a and ORF7-1 show 98.8% identity in 331 aa overlap: 

10 20 30 40 50 60 

orf 7a, pep MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
I II i I I I I I I I I I I I I t I t I I I I I I I I I I I i I I I I I II II I I I I I I I I I I I I I I I 1 I II I 
or f 7-1 MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf7a.pep HVLTAAAYVLGVHNRLHTGT YRLPSEVSAWDI LQKMRGGRPDSVTVQI IEGSRFSHMRKV 
I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I II I I I I I M I I I II I I II I I I I I I I 1 I I I I I 
or f 7-1 HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQI IEGSRFSHMRKV 

70 80 90 100 110 120 



orf7a 

orf7 .pep 
orf 7a 

orf 7 , pep 
orf 7a 

orf 7a 



130 140 150 160 170 180 
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orf7a nep IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 
HUM I I I I M II I I 1 ! I I I U M 1 M I i I f II M i i I i f t ( I t 1 M 1 I 1 t I MM 
orf7-l I DAT PDI GH DTKGW SNEKLMAEVAPD AFS GN PEGQ FFP D S YE I D AGG S DLQ I YQT AYKAM 

130 140 150 160 170 180 

5 190 200 210 220 230 240 

orf7a pep QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVHVNRLKIGMRLQTD 
i I M I M M 1 II I M M i M I I I M M M : I II M I I II I M I M I I I M I M II I I M I 
orf7-l QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 
10 190 200 210 220 230 240 

250 260 270 280 290 300 

orf7a pep PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
M | I M I II M M M II II I I II M M I M I I I II II M II II II M M I M i I I II I II 
15 orf7-l P S V I YGMGAAYKGK I RKADLRRDT P YN T YTRGGL P PT P I AL PGKAALD AAAH P SGEK YL Y 

250 260 270 280 290 300 

310 320 330 

or f 7 a. pep FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
20 I I I M I M M I I i I M I M M I I M M M I II 

orf7-l FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 

310 320 330 

Homology with a predicted ORF from N. gonorrhoeae 
25 ORF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (ORF7.ng) from N. 
gonorrhoeae: 

orf7 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

I I II M M II M I I II M I M I M M II I I M I M II M I I II I II I I I M II I I II I I I 
orf7ng MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

orf7 FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 120 

I II I I I II I I I I I I II I I I I I I II I M I I I M I : I I M I I M I I I I I II II MIIMI 
orf7ng FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 120 



30 



35 orf7 HEAXXDHVASVFWRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLP 180 

Ml M I I II II I I II M II M I I M II I M M M II II I I I M I M I M II MM 
orf7ng HEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 180 

orf7 PTPIALP 187 

40 M M M 

orf7ng PTRIALPGKAAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYILKK 236 

An ORF7ng nucleotide sequence <SEQ ID 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

45 51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVNRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDT P YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

Further sequence analysis revealed a partial DNA sequence of ORF7ng <SEQ ID 37>: 

50 1 ..taccgaatca AGATTGCCAA AAATCAGGGT ATTTCGTCGG TCGGCAGGAA 

51 ACTTGCcgaA GACCGCATCG TGTTCAGCAG GCATGTTTTG ACAGCGGCGG 

101 CCTACGTTTT GGGTGTGCAC AACAGGCTGC ATACGGGGAC gTACAGATTG 

151 CCTTCGGAAG TGTCTGCTTG GGATATCTTG CAGAAAATGC GCGGCGGCAG 

201 GCCGGATTCC GTTACCGTGC AGATTATCGA AGGTTCGCGT TTTTCGCATA 

55 251 TGAGGAAAGT CATCGACGCA ACGCCCGACA TCGGACACGA CACCAAAGGC 

301 TGGAGCAATG AAAAACTGAT GGCGGAAGTT GCGCCCGATG CCTTCAGCGG 

351 CAATCCTGAA GGGCAGTTTT TTCCCGACAG CTACGAAATC GATGCGGGCG 

401 GCAGCGATTT GCAGATTTAC CAAACCGCCT ACAAGGCGAT GCAACGCCGC 

451 CTGAACGAGG CATGGGCAGG CAGGC AGG AC GGGCTGCCTT ATAAAAACCC 

60 501 TTATGAAATG CTGATTATGG CGAGCCTGAT CGAAAAGGAA ACGGGGCATG 
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551 AGGCCGACCG CGACCATGTC GCTTCCGTCT TCGTCAACCG CCTGAAAATC 

601 GGTATGCGCC TGCAAACCGA CCCGTCCGTG ATTTACGGCA TGGGTGCGGC 

651 ATACAAGGGC AAAATCCGTA AAGCCGACCT GCGCCGCGAC ACGCCGTACA 

701 aCAccTAtac gggcgggggc ttgccgccaa cccggattgc gctgcccggC 

5 751 Aaggcggcaa tggatgccgc cgcccacccg tccggcgaAa aatacctgTa 

801 tttcgtgtcC AAAATGGACG GCACGGGCTT GAGCCAGTTC AGCCATGATT 

851 TGACCGAACA CAACGCCGCc gTcCGCAAAT ATATTTTGAA AAAATAA 

This corresponds to the amino acid sequence <SEQ ID 38; ORF7ng-l>: 

1 ..YRIKIAKNQG ISSVGRKLAE DRIVFSRHVL TAAAYVLGVH NRLHTGTYRL 

10 51 PSEVSAWDIL QKMRGGRPDS VTVQIIEGSR FSHMRKVIDA TPDIGHDTKG 

101 WSNEKLMAEV APDAFSGNPE GQFFPDSYEI DAGGSDLQIY QTAYKAMQRR 

151 LNEAWAGRQD GLPYKNPYEM LIMASLIEKE TGHEADRDHV ASVFVNRLKI 

201 GMRLQTDPSV IYGMGAAYKG KIRKADLRRD TPYNTYTGGG LPPTRIALPG 

251 KAAMDAAAHP SGEKYLYFVS KMDGTGLSQF SHDLTEHNAA VRKYILKK* 

15 ORF7ng-l and ORF7-1 show 98.0% identity in 298 aa overlap: 

10 20 30 40 50 60 

or f 7-1 . pep KLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

I I II I I I I I I I M I II I I I I M ! I I I I I I I 
orf7ng-l YRIKIAKNQGISSVGRKLAE DRIVFSRHVL 

20 10 20 30 

70 80 90 100 110 120 

orf 7-1 . pep TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
I I M M M I M I M II I M M I I I I I M I I I I I M II I I I I I I I M I I II I I M I M I I I 
25 orf7ng-l TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 7-1. pep TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 
30 I I I I I I II II I I M I I II I I I I I I I I I I II I I I I M I I I I II I II I I M I I I I I I M I M 

orf7ng-l TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 
100 110 120 130 140 150 

190 200 210 220 230 240 

35 orf 7-1. pep LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 

I I I II : I I I I I I I I I I I I I I I II I I : II I I I II I I I I I I I I I I I I I M I I M I I I M I I 
orf7ng-l LNEAWAGRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDFSV 
160 170 180 190 200 210 

40 250 260 270 280 290 300 

orf 7-1 . pep IYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 

I I M I I I I M I M I I I I I I I I I I I II I I I I I I I I I I I I I I I : I I I I I I I I I I I II I II 
orf7ng-l IYGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTRIALPGKAAMDAAAHPSGEKYLYFVS 

220 230 240 250 260 270 

45 

310 320 330 

orf 7-1 .pep KMDGTGLSQFSHDLTEHNAAVRKYILKKX 

II I I I II II I I 1 I I II I II I I II I ! I 1 I I 
orf7ng-l KMDGTGLSQFSHDLTEHNAAVRKYILKKX 

50 280 290 



In addition, ORF7ng-l shows significant homology with a hypothetical E.coli protein: 

sptP28306|YCEG_ECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
gi | 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW; P28306 but 
55 has 97 additional C-terminal residues [Escherichia coli] Length = 340 

Score = 79 (36.2 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 
Identities = 20/87 (22%), Positives = 40/87 (45%) 

Query: 10 GISSVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPD 69 

60 G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 

Sbjct: 49 GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 108 

Query; 70 SVTVQI IEGSRFSHMRKVIDATPDIGH 96 
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++++EG R S K + P I H 
Sbjct: 109 QFPLRLVEGMRLSDYLKQLREAPYIKH 135 

Score « 438 (200.7 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 
Identities = 84/155 (54%), Positives « 111/155 (71%) 



Query: 


120 


Sbjct: 


158 


Query: 


180 


Sbjct: 


218 


Query: 


240 


Sbjct : 


278 



MAS+IEK 



ETGHEADRDHVASVFVNRLKIGMRLQTDPSVI YGMGAAYKGKIRKADLRRDTPYNTYTGG 239 
ET ++RD VASVF+NRL+IGMRLQTDP+VIYGMG Y GK+ +ADL T YNTYT 
ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 277 

GLPPTRIALPGKAAMDAAAHPSGEKYLYFVSKMDG 27 4 
GLPP IA PG ++ AAAHP+ YLYFV+ G 
GLPPGAIATPGADSLKAAAHPAKTPYLYFVADGKG 312 

Based on this analysis, including the fact that the Kinfluenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AGATATTTAC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

301 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

351 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 

401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 

451 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ID 40; ORF9>: 

1 . . RFKMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKN 

51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 

101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 

151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 

1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT ATGAAACAGC 

101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC CGCCTTGCAA AAGGGGCAGG 

251 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

4 01 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

4 51 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

601 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

651 CGTACAGGGA CGCGAAAAGG AAAAGGCAAT CGGAGCTTTG CAGCGTTTGG 

7 01 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

7 51 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 
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901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 ATGATGTATG CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

1101 AAAAGTATCC GCGCCGGAAT ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

1151 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGCG GCAGATCGGC 

1201 AGGGTGCGGA AACTTCCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA. 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

1301 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

1351 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

14 01 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 

14 51 GGCTTGCACC CGATAACGCT CAGATTATGA ATAATCTGGG CTACAGCCTG 

1501 CTGACCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 

1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 

1601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 

1651 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 

17 01 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGG ACGCAGG 

1751 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 

1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

This corresponds to the amino acid sequence <SEQ ID 42; ORF9-l>: 

1 MLPNRFKMLT VLTATLIAGQ VSAAGG GAGD MKQPKEVGKV FRKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVLAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

201 LRYEHLPEAA VADVVFSVQG REKEKAIGAL QRLAKLDTEI LPPTLMTLRL 

251 TARKYPEILD GFFEQTDTQN LSAVWQEMEI MNLVSLHRLD DAYARLNVLL 

301 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 

351 MMYADRRDYA KVRQWLKKVS APEYLFDKGV LAAAAAVELD GGRAALRQIG 

4 01 RVRKLPEQQG RYFTADNLSK IQMLALSKLP DKREALRGLD KIIEKPPAGS 

4 51 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 

551 SFENDPEPEV AAHLGEVLWA LGERDQAVDV WTQAAHLTGD KKIWRETLKR 

601 HGIALPQPSR KPRK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

orf 9 . pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
II : I : I I : I : j : I I I : II I I : I I M I i I M I II II I I I I I i I M I M I I I 
orf 9a MLPARFTILSVLAAALLAGQAYAA--GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
10 20 30 40 50 

60 70 80 90 100 110 

- orf 9 . pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
I I I I II M I I I I I I M II II I I I I I I i I I I I i I I I I I I I I I I I It I t I I I I I I M I I I I 
orf 9a AVGERVNQ I FT LLGXETALQKGQAGTALAT YMLMLERTKS PEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

120 130 140 150 160 

orf 9. pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 
I I II II I I I I I I I I I I I II I I I I I I I I I I I M I I I I I I II I I I II I I 
orf 9a EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
120 130 140 150 160 170 

orf 9a AAVQQDGLAQKASKAVRRAALRYEHL PE AAV ADW FSVQXREKEKAI GALQRLAKLDT E I 

180 190 200 210 220 230 

The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 

1 ATGTTACCCG CCCGTTTCAC CATTTTATCT GTGCTCGCGG CAGCCCTGCT 
51 TGCCGGGCAG GCGTATGCCG CCGGCGCGGC GGATGCGAAG CCGCCGAAGG 
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101 AAGTCGGAAA GGTTTTCAGA AAGCAGCAGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAGCGGG TTAATCAGAT 

201 ATTTACGTTG CTGGGANGGG AAACCGCCTT GCAAAAGGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCNCTGAACG CGTTTGAACA 

351 GGCGGAAATG ATTTATCAGA AATGGCGGCA GATTGAGCCT ATACCGGGTA 

401 AGGCGCAAAA ACGGGCGGGG TGGCTGCGGA ACGTGCTGAG GGAAAGAGGA 

4 51 AATCAGCATC TAGACGGACT GGAAGAANTG CTGGCTCAGG CGGACGAANG 

501 ACAGAACCGC AGGGTGTTTT TATTGTTGGC ACAAGCCGCC GTGCAACAGG 

551 ACGGGTTGGC GCAAAAAGCA TCGAAAGCGG TTCGCCGCGC GGCGTTGAGA 

601 TATGAACATC TGCCCGAAGC GGCGGTTGCC GATGTGGTGT TCAGCGTACA 

651 GGNACGCGAA AAGGAAAAGG CAATCGGAGC TTTGCAGCGT TTGGCGAAGC 

701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

7 51 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCACAGGCT GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACGC 

901 AATCCGAATG CAGACCTGTA TATTCAGGCA GCGATATTGG CGGCAAACCG 

951 AAAAGAANGT GCTTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCAGGGCGG CAATGACGGC GGCGATGATA 

1051 TATGCCGACC GAAGGGATTA CACCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG TGTGCTGGCG GCTGCGGCGG 

1151 CTGTCGAGTT GGACNGCGGC AGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGTTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAGGCTT 

1301 TGAGGGGGTT GGACAAGATT ATCGAAAAAC CGCCTGCCGG CAGTAATACA 

1351 GAGTTACAGG CAGAGGCATT GGTACAGCGG TCAGTTGTTT ACGATCGGCT 

14 01 TGGCAAGCGG AAAAAAATGA TTTCAGATCT TGAAAGGGCG TTCAGGCTTG 

14 51 CACCCGATAA CGCTCAGATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA AGGCTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CTGTCAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAANG CGACGCGGAA AGCGCGCTGC CGTATCTGCG GTATTCGTTT 

1651 GAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 

1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 

1751 ACCTTACGGG AGACAAGAAA ATATGGCGGG AAACGCTCAA ACGTCACGGC 

1801 ATCGCATTGC CCCAACCTTC CCGAAAACCT CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 44>: 

1 MLPARFTILS VLAAALLAGQ AYAAGA ADAK PPKEVGKVFR KQQRYSEEEI 

51 KNERARLAAV GERVNQIFTL LGXETALQKG QAGTALATYM LMLERTKSPE 

101 VAERALEMAV SLNAFEQAEM IYQKWRQIEP I PGKAQKRAG WLRNVLRERG 

151 NQHLDGLEEX LAQADEXQNR RVFLLLAQAA VQQDGLAQKA SKAVRRAALR 

201 YEHLPEAAVA DWFSVQXRE KEKAIGALQR LAKLDTEILP PTLMTLRLTA 

251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLHRLDDA YARLNVLLER 

301 NPNADLYIQA AILAANRKEX ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 

351 YADRRDYTKV RQWLKKVSAP EYLFDKGVLA AAAAVELDXG RAALRQIGRV 

4 01 RKLPEQQGRY FTADNLSKIQ MFALSKLPDK REALRGLDKI IEKPPAGSNT 

451 ELQAEALVQR SVVYDRLGKR KKMISDLERA FRLAPDNAQI MNNLGYSLLS 

501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKXDAE SALPYLRYSF 

551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLTGDKK IWRETLKRHG 

601 IALPQPSRKP RK* 



ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 

10 20 30 40 50 

MLPARFT I LSVLAAALLAGQAYAAG — AADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
III II : I : I I : I : j : I I I : III I : 1 I I I I I I I II I I I I 1 I I I M I I I I 1 I I M 
MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
10 20 30 40 50 60 

60 70 80 90 100 110 

AVGERVNQI FTLLGXETALQKGQAGTALAT YMLMLERTKS PE VAERALEMAVS LNAFEQA 
I I I I M I II I I I I I I I I I I I I I I I I I I M I ! I I I I M I II I I I 1 I || | | ! | | | | | | | | | 
AVGERVNQI FTLLGGETALQKGQAGTALAT YMLMLERTKS PE VAERALEMAVS LNAFEQA 
70 80 90 100 110 120 

120 130 140 150 160 170 

EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
I I I I I 1 I M 1 I I I 1 I I I I I II 1 I I I I I I M I 11 I I I I I M I I I I I I I I I I I | I M II I 



orf 9a . pep 
orf 9-1 

orf 9a . pep 
orf 9-1 

orf 9a .pep 
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orf 9-1 EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
130 140 150 160 170 180 

180 190 200 210 220 230 

5 orf9a pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 

I M I I I I I I I I I t I I I I it I I : I I I I I I I I I I I I I I I I I I I I I M I I I ! I M I I I II I I 
O r f 9 - 1 AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADVVFS VQGREKEKAI GALQRLAKLDTE I 

190 200 210 220 230 240 

10 240 250 260 270 280 290 

orf 9a pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
I I I I I ! I I I M I I I 1 I 1 I M I I I I I I I I I I I I 1! I M I) I 1 I I I 1 I 1 I I II I I M I II I I 
orf 9-1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 

15 

300 310 320 330 340 350 

orf 9a. pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 
I II I t II M I i I I It I I I I i I II I I I I I I I I I I I I I I I I I : I I I : I I I I :( I I I I I I : 
orf 9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
20 310 320 330 340 350 360 

360 370 380 390 400 410 

orf 9a . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I I I I I I I I I II 11 I I I M I 1 I I I I I I I I I I II I II I I I I I II I I I I II M I I I I I I I I 
25 orf 9-1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

370 380 390 400 410 420 

420 430 440 450 460 470 

orf 9a. pep IQMFALSKLPDKREALRGLDKI IEKPPAGSNTELQAEALVQRSWYDRLGKRKKMI SDLE 
30 I I I : I I I I I M I I I I M I t I I I I I I I I I I I I I I ( I (I M I I t M I II I I I M ( I I I I I I I 

orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
430 440 450 460 470 480 

480 490 500 510 520 530 

35 orf 9a , pep RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

I I I 1 II I I I I I I I I I I I I I I I : I I I 11 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
490 500 510 520 530 540 

40 540 550 560 570 580 590 

orf 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
I II I I I I I ! I I I I I I I I I I I I I I II I I ! I I I II I I II I I I II I I I M I I I I II I I I II I I 
orf 9-1 AE S AL P Y LR Y S FEN D PE PE V AAH LGE V LW ALGE R DQAV D VW T Q AAHL T GDKK I W RE T L KR 

550 560 570 580 590 600 

45 

600 610 
or f 9a . pep HGI ALPQPSRKPRKX 
I i I I I I i II 1 I I I I I 
o r f 9 - 1 HG I ALPQ PS RKPRKX 

50 610 

Homology with a predicted ORF from N. gonorrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from N. 
gonorrhoeae: 

55 Orf 9 RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 54 

II : I : I I : I : I : I I I : II i I : I : : I I I I I I I : II :: I I I I I I I I I 1 I I I 
orf 9ng M I ML PAR FT I LS VLAAALLAGQAYAA- -GAAD VE L PKE VGKVLRKHRRYS EEE I KNE RAR 58 

orf 9 LAAVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVT^ERALEMAVSLNAFE 114 

60 I I I I I I I I I :: II I M I I I I I I I I I I I I I III I M I I I I I I I I i M I I I i I I II I I I I I I 

orf 9ng LAAVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVS LNAFE 118 

orf 9 QAEM I YQKWRQ I E P I PGKAQKRAGWLRN VLRERGN QHLDGREE VLAQADEGQ 166 

_ I I I I I I M I I II I I I I I : I I I I I II I I I I : I I I I I I I I I I I : I 

65 orf 9ng QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQSDYVHQPMIFLLL 178 
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The ORF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 MIMLPARFTI LSVLAAALLA GQAYAAGA AD VELPKEVGKV LRKHRRYSEE 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 
domain. 



Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

351 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 

4 01 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 

4 51 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 

501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 

551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 

601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 

651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 

7 01 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 

901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 

951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

1051 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 

1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 

1351 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

14 01 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 

14 51 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAAC CCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 

1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 

17 01 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 

17 51 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 

1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 

1 MLPARFTILS VLAAALLAGQ AYAAGAA DVE LPKEVGKVLR KHRRYSEEEI 

51 KNERARLAAV GERVNRVFTL LGGETALQKG QAGTALATYM LMLERTKSPE 

101 VAERALEMAV SLNAFEQAEM IYQKWRQIEP IPGEAQKPAG WLRNVLKEGG 

151 NQHLDGLKEV LAQSDDVQKR RIFLLLVQAA VQQGGVAQKA SKAVRRAALK 

201 YEHLPEAAVA DAVFGVQGRE KEKAIEALQR LAKLDTEILP PTLMTLRLTA 

251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLRKPDDA YARLNVLLEH 

301 NPNANLYIQA AILAANRKEG ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 

351 YADRRDYAKV RQWLKKVSAP EYLFDKGVLA AAAAAELDGG RAALRQIGRV 

401 RKLPEQQGRY FTADNLSKIQ MLALSKLPDK REALIGLNNI IAKLSAAGST 

4 51 EPLAEALAQR SIIYEQFGKR GKMIADLETA LKLTPDNAQI MNNLGYSLLS 
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501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKGDAE SALPYLRYSF 
551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLRGDKK IWRETLKRYG 
601 IALPEPSRKP RK* 



ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 



10 



15 



20 



10 20 30 40 50 60 

orf 9-1 pep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
(II || : | : I I : I : I : I I I : lit I : I : : I 1 I I M I : I I : : I I M I I ! II I II I M 
orf 9ng-l MLPARFTILSVLAAALLAGQAYAAG--AADVELPKEVGKVLRKHRRYSEEEIKNERARLA 
10 20 30 40 50 

70 80 90 100 110 120 

orf 9-1 pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

| I M | I I :: I I I M I I 11 I I I I I I I M I M I I M I I I I I M I I I II M I I 1 I I I I M I M 
orf9ng-l AVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 9-1 . pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
t I | I I I M I II I I I I : M I I I II I N I : I M I M I I I : M II I : I : 1 : I I : I t I I : I 
orf9ng-l EMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRIFLLLVQ 
120 130 140 150 160 170 



25 



190 200 210 220 230 240 

orf 9-1 . pep AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 
Mill I : I I i II M I I II I I I M I I I I M I I I : I I : I M I M I I I I I M I I I I M I I I 
orf 9ng-l AAV QQGGV AQKAS KAVRRAALKYE HL PE AAV AD AV FG VQGREKE KA I E ALQRL AKL DT E I 

180 190 200 210 220 230 



30 



35 



40 



45 



250 260 270 280 290 300 

orf 9-1. pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
II I I I I I I 11 I I II I II II I M I II I M II I I I M I I I I 1 1 I I I I I : : M I I I I 1 I I I I 
orf9ng-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLNVLL 
240 250 260 270 280 290 

310 320 330 340 350 360 

orf 9-1 . pep ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
I : I M I : I I I I I M I I I I I I I ( I II II I M I I II II I I I I I : I I I : I I M : I I I I I I I I 
orf 9ng-l EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

370 380 390 400 410 420 

orf 9-1 . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I I I 1 I I I I M I I I II M I I I I I I I I : I I I I I I I I I I I I I I I I I I I II I I I I I M I I I I I 
orf9ng-l KVRQWLKKVS APE YL FDKG VLAAAAAAEL DGGRAALRQ I GR VRKL PE QQGR Y FT ADN L S K 

360 370 380 390 400 410 



50 



430 440 450 460 470 480 

orf 9-1 . pep IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSVVYDRLGKRKKMISDLE 
I I I I I I I I I 11 I I I I I i I : : I I I I : : : I I I I II : I I I : : I : : : II 1 I I I : 1 I 1 
orf 9ng-l IQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 
420 430 440 450 460 470 



55 



490 500 510 520 530 540 

orf 9-1 . pep RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
I : : I : II II I M I II I II I I : I I I II M I I I I I I I I I I M I I I I I I I I I I I I II I I II I 
orf 9ng-l TALKLTPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
480 490 500 510 520 530 



60 



65 



550 560 570 580 590 600 

orf 9-1 . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
I I I I M I I I M 11 11 I I I I I I I I I I I I I I I I 11 I I I I I II 1 I I I I I I I 1 I II I I 1 I I I I 
orf 9ng-l AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 
540 550 560 570 580 590 

610 

orf 9-1 . pep HGIALPQPSRKPRKX 
: II II I : I I M I I M 
orf9ng-l YGIALPEPSRKPRKX 
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In addition, ORF9ng shows significant homology with a hypothetical protein from P. aeruginosa: 

sp|P42810|YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 

(ORF3 ) 

5 >gi|10729991pir| 1S49376 hypothetical protein 3 - Pseudomonas aeruginosa >gi|557259 

(X82071) orf3 [Pseudomonas aeruginosa] Length = 576 
Score = 128 bits (318), Expect = le-28 

Identities = 138/587 (23%), Positives = 228/587 (38%), Gaps = 125/587 (21%) 

10 Query: 67 VFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQAEMIYQKWR 12 6 

+++LL E A Q+ + AL+ Y++ ++T+ P V+ERA +A L A ++A W 
Sbjct: 53 LYSLLVAELAGQRNRFDIALSNYWQAQKTRDPGVSERAFRIAEYLGADQEALDTSLLWA 112 

Query: 127 QIEPIPGEAQKPAG WLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRI 172 

15 + p +AQ+ A ++ VL G+ H D L A++D + + 

Sbjct: 113 RSAPDNLDAQRAAAIQLARAGRYEESMVYMEKVLNGQGDTHFDFLALSAAETDPDTRAGL 172 

Query: 173 FXXXXXXXXXXXXXXXKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLA 232 

++ KY + + A+ Q ++A+ L+ + 

20 Sbjct: 173 L QSFDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 



25 



45 



60 



65 



Query: 233 KLDTEILPPTLMTLRLTARK YPE I LDGFFEQT DTQNLSAVWQEME IMNLVS LRKP 287 

E+PL+L + K P+GED + + + + LV -f 
Sbjct: 215 ASRHEVAPLLLRSRLLQSMKRSDEALPLLKAGIKEHPDDKRVRLAYARL LVEQNRL 270 

Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

DDA A L++ P+ A +Y++ + 

Sbjct: 271 DDAKAEFAGLVQQFFDDDDDLRFSLALVCLEAQAWDEARIYLEELVERDSHVDAAHFNLG 330 



30 Query: 313 -LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYAKVRQWLKKVSAPE 371 

LA +K+ A +D YA+ GG + T++ARDAR + P+ 

Sbjct: 331 RLAEEQKDTARALDEYAQ--VGPGNDFLPAQLRQTDVLLKAGRVDEAAQRLDKARSEQPD 388 

Query: 372 YL FDKXXXXXXXXXXXXXXXXXXRQI GRVRKL PEQQGRY FTADN LS K I QMLAL S KL P DKR 431 
35 Y A L 1+ ALS + 

Sbjct: 389 Y A I QL YL I EAEALSNN DQQE 408 

Query: 432 EAL I GLNN 1 1 AKL SAAGSTE PLAEALAQRS 1 1 YEQ FGKRGKM I ADLETALKLT PDN AQ IM 491 
+A + + + ELL RS++ E+ +M DL + PDNA + 

40 Sbjct: 4 09 KAWQAIQEGLKQYP EDL-NLLYTRSMLAEKRNDLAQMEKDLRFVIAREPDNAMAL 4 62 

Query: 4 92 NNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSFE 551 

N LGY+L + R E L+ A+++NPDD A+ DS+GW Y +G A YLR + + 
Sbjct: 4 63 NALGYTLADRTTRYGEARELILKAHKLNPDDPAILDSMGWINYRQGKLADAERYLRQALQ 522 



Query: 552 NDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

P+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 
Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 



50 gi | 2983399 (AE000710) hypothetical protein [Aquifex aeolicus] Length = 545 

Score = 81.5 bits (198), Expect = le-14 

Identities = 61/198 (30%), Positives = 98/198 (48%), Gaps - 19/198 (9%) 

fFTADNL-SKIQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQ 459 

55 " G Y A L K ++LA PDK+E L + +K + + L + 

f E D AKRL I EKAKVLA PDKKEILFLEADYYSKTKQYDKALEILKKLEKDYPNDSR 390 



Query: 


408 


Sbjct: 


335 


Query: 


460 


Sbjct: 


391 


Query: 


514 


Sbjct : 


451 


Query: 


573 


Sbjct: 


511 



+I+Y+ G L A++L P+N N LGYSLL +R++E L++ 



A + +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 



++A + + +A L + K 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 7 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CATCAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CG^CTGGGCG 

351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

4 51 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 

7 01 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORF1 1>: 

1 . . NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 

51 W AIIVLTIIV KAVLYPLT NA SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 

101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALFA SVE LRQAPWLGWI 

151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IMP LVFSXXF 

201 FFFPAGXVLY WWNNLLTIA QQWHINRSIE KQRAQGEVVS * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 51>: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCGCTC TTCTGGCTCC TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGCTCTATG GCGAAAATGC 

1151 GTGCCGCCGC ACCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AACAACAGGC GATGATGCAG CTTTACACAG ACGAGAAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCCT ACTACATCCT 

1401 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACTTAT CTGAACCCGC 

1451 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCGTTGGTT 

1501 TTCTCCGTCA TGTTCTTCTT CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 
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1551 AGTCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 
1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 52; ORF1 1-1>: 

1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 

5 51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFILFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGRQSVCAAG ECNIDIKRRN DKLYSTSVSV PLAAIQNGAK 

10 301 AEASINLYAG PQTTSVIANI ADNLQLAKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LT I I VKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

4 01 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

4 51 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 FSVMFFFFPA GLVLY WVVNN LLTIAQQWHI NRSIEKQRAQ GEWS* 

15 Computer analysis of this amino acid sequence gave the following results: 

Homology with a 60kDa inner-membrane protein (accession P2S754) of Pseudomonas putida 
ORF1 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 



20 



25 



30 



ORFll 


2 


60K 


324 


ORFll 


62 


60K 


384 


ORFll 


122 


60K 


444 


ORFll 


182 


60K 


504 



LYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLTIIVK 61 
LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 



AVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRXXXXXXXXXLYTDEKINPLGGCLPM 121 
+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 



L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 



DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 



Homology with a predicted ORF from N. meningitidis (strain A) 
35 ORF1 1 shows 97.9% identity over a 240aa overlap with an ORF (ORF1 la) from strain A of K 
meningitidis: 

10 20 30 

orf 11 .pep N L YAG PQTT SV I AN I ADNLQLAKD YGKVHW 

I I I I I I I I I I 11 I II I I M II I II II M I 
40 orf 11a I KRRN DKLYSTSVSVP L AA I QNG AK S X AS I N L Y AG P QT T S V I AN I ADN L QLXKD Y GKV HW 

280 290 300 310 320 330 

40 50 60 70 80 90 

orf 11 .pep FASPLFWLLNQLHNIIGNWGWAIIVLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKE 
45 M M M M M M I I I I II M I I I 1 I II I M I I 1 I I I I I I I I I I I I I I M II I I I I I I M I 

orf 11a FASPLFWLLNQLHNIIGNWGWAIIVLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKE 
340 350 360 370 380 390 

100 110 120 . 130 140 150 

50 orf 11 . pep KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 

I I II I M M I I I I I M I I I I I M I ! I I I I I I I I II M I I M I I II I II I I I J I I I I I I II 
orf 11a KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 
400 410 420 430 440 450 

55 160 170 180 190 200 210 

orf 11 . pep TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLY 
! I ! I I ! I I! I ! I I I I I I I I ! I! I I M I ! I I ! ! I M I I I I ! I I I I I MM! MM Ml 
orf 11a TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKI^3KIMPLVXSXXFFXFPAGLVLY 
460 470 480 490 500 510 
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220 230 240 

orf 11 .pep WVVNNLLT I AQQWHINRS IEKQRAQGEWSX 
M : I I II I It I I II 1 M I 1 I 1 I M I I M I II 
orf 11a WVINNLLTIAQQWHINRS IEKQRAQGEWSX 

520 530 540 

The complete length ORF1 la nucleotide sequence <SEQ ID 53> is; 

1 ANGGATTTTA AAAGACTCAC NGNGTTTTTC GCCATCGCAC TGGTGATTAT 

51 GATCGGATNG NAAANGATGT TCCCCACTCC GAAGCCCGTC CCCGCGCCCC 

101 AACAGACGGC ACAACAACAG GCCGTAANCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGNAN CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CNAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAANAA 

301 TACACCTACN TCGCCCANTC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCAC CTGAAACACG CGGTCTGAAA 

4 51 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCGACTTGG ACGACGATGC CAANTCCGGN AAATCCGAGG 

7 01 CCGAATACAT CCGCAAAACC CNGACCGGCT GGCTCGGCAT GATTGAACAC 

7 51 CACTTCATGT CCACCTGGAT CCTCCAACCC AAAGGCGGAC AAAGCGTTTG 

801 CGCCGCTGGC GACTGCNGTA TNGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CTATCCAAAA CGGTGCGAAA 

901 TCCNAAGCCT CCATCAACCT CTACGCCGGC CCACAGACCA CATCNGTTAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGN CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCCCTC TTTTGGCTTT TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGTTCGATG GCGAAAATGC 

1151 GTGCCGCCGC GCCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AGCAACAAGC CAT GAT GC AG CTTTACACAG AC G AG AAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCNT ACTACATCCT 

14 01 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACCTAT CTGAACCCGC 

14 51 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCTTTGGTT 

1501 NTNTCNNNNA NGTTCTTCNN CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 

1551 GATCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 

1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 54>: 



1 XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 

51 APXXPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDXNK PFILFGDGKX 

101 YTYXAXSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPVVYT PEGNFQKVSF SDLDDDAXSG KSEAEYIRKT XTGWLGMIEH 

251 HFMSTWILQP KGGQSVCAAG DCXXDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 SXASINLYAG PQTTSVIANI ADNLQLXKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

4 01 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

4 51 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 XSXXFFXFPA GLVLY WVINN LLTIAQQWHI NRSIEKQRAQ GEWS* 

ORF1 la and ORF1 1-1 show 95.2% identity in 544 aa overlap: 



10 20 30 40 50 60 

orf 11a . pep XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 
(Mill M M II I I II I I I I M I I I II I I I : I I II I I : I M I M I II : I I I I I I 
orf 11-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 11a . pep DTVQAVIDEKSGDLRRLTLLKYKATGDXNKPFILFGDGKXYTYXAXSELLDAQGNNILKG 
I I I II I I I I I I I I I M I I II I II I I I I I M I I I I I II I II I I II I II I I I I I I I I I 
orf 11-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
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70 80 90 100 110 120 

130 140 150 160 170 180 

or f 11a Den IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
or P P | | M M | | M I II M I I ! I! 1 I M ) I 1 I M I I I I I M I I I 1 I t M I I I 11 M I M I I M I 

orfll-l IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 

190 200 210 220 230 240 

1Q or f 11a pep SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAXSGKSEAEYIRKT 

MIIMMMMMMIMMIIIIMMMIMIMMMMIMI M M I M I I II I 
orfll-1 SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 

15 250 260 270 280 290 300 

orflla pep XTGWLGMIEHHFMSTWILQPKGGQSVCAAGDCXXDIKRRNDKLYSTSVSVPLAAIQNGAK 
I I M I I I II I II I I M II I I I I I I I II I : I M M I I I I M I I I I I I M II II I M I 
orfll-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQNGAK 

250 260 270 280 290 300 

20 

310 320 330 340 350 360 

orflla pep ' SXASINLYAGPQTTSVIANIADNLQLXKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 
: 1 I I M M I ! I M I M I M II I 1 I I I I I I I I , I I I I M I i i I I M I I II i I I I I ( it I 
orfll-1 AEAS IN LYAG PQT T S V I AN I ADN LQLAKD YGKVHW FAS PL FWLLNQL HN 1 1 GNWGW AI IV 

25 310 320 330 340 350 360 

370 380 390 400 410 420 

orflla. pep LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 
I || M I M I II I I I I I M II I I I I I I I I I I I I I I I I I M I II I I I I I I I 1 I I I I I I I I I I 
30 orfll-1 LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orflla . pep GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 
35 11 I II I I 1 I M I I I I I I I I I II 1 I I I M I I I M I I I M M I M M I M M I M I I I I M I 

orfll-1 GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

430 440 450 460 470 480 

490 500 510 520 530 540 

40 orflla. pep LNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLTIAQQWHINRSIEKQRAQ 

I I I I I II II I II I I I II M I I II I I I I I I I I I I : I I I I I I I II I M I I I I M 11 I I 
orfll-1 LNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQ 

490 500 510 520 530 540 



45 



50 



orflla. pep GEVVSX 
MINI 

orfll-1 GEVVSX 



Homology with a predicted ORF from N. gonorrhoeae 
ORF11 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORF1 l.ng) from N. 
gonorrhoeae: 

Orfll NLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLT 57 

55 I I II I I M i I 1 ! I II II I I I I I M I I I I M I I I I I I I I M ! I I I M I ! I I I I 1 : I 1 I 

or fling MAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIWLT 60 

orfll IIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPLGG 117 

! M 1 M I I I I I I I II I i I I I I I I I I I : M : I I I I I I I I M I I II I ! I I I : 11 : I II I II 
60 or fling IIVKAVLYPLTNASYRSMAKMRAAAPELQTIKEKYGDDRMAQQQAMMQLFEDEEINPLGG 120 

orfll CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 177 

I I M I I I I I I I I I I M ! I I I I I I i II I I I II II I I I 1 I , I I I I I M I I I I I I II I I I 1 I I 
or fling CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 180 

65 
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or fH pppTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWWNNLLTIAQQWHINRSIEKQRAQGE 237 

M | | I M I I I I 1 I I I I I I I I I I I I M I I II I I M I 1 I 1 I I II 1 I I I ! I M II I I I I ! 
orfllng pppTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRAQGE 240 

5 orfll WS 240 

I I I 

orfllng VVS 243 

An ORF1 Ing nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 

10 1 MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK VHWFASPLFW LLNQLHNIIG 

51 NWGW AIWLT IIVKAVLYPL T NASYRSMAK MRAAAPELQT IKEKYGDDRM 

101 AQQQAMMQLF EDEEINPLGG CLP MLLQIPV FIGLYWALFA SVELRQAPWL 

151 GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK MMKIMPLVFS 

201 VMFFFFPAGL VLY WWNNLL TIAQQWHINR SIEKQRAQGE WS* 

15 Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 5 7> to be: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAAATGT TCCCCACCCC GAAACCCGTC CCCGCGCCCC 

101 AACAGGCGGC ACAAAAACAG GCAGCAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTTAT 

20 201 TGATGAAAAA AGTGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAACAAA CCGTTCGTCC TGTTTGGCGA CGG CAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTGAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC ACCCTCAACG 

401 GCGACACAGT CGAAGTCCGC CTGAGCGCGC CCGAAACCAA CGGACTGAAA 

25 451 ATCGACAAAG TCTATACCTT TACCAAAGAC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCgacTTgg acgACGATGC gaaaTccggc aaATccgagg 

30 701 ccgaatacaT CCGCAAAACC ccgaccggtt ggctcggcat gattgaacac 

7 51 cacttcatgt ccacctggat cctccAAcct aaaggcggcc aaaacgtttg 

801 cgcccaggga gactgccgta tcgacattaa aCgccgcaac gacaagctgt 

851 acagcgcaag cgtcagcgtg cctttaaccg ctatcccaac ccgggggcca 

901 aaaccgaaaa tggcggTCAA CCTGTATGCC GGTCCGCAAA CCACATCCGT 

35 951 TATCGCAAAC ATCGCcgacA ACCTGCAACT GGCAAAAGAC TACGGTAAAG 

1001 TACACTGGTT CGCATCGCCG CTCTTCTGGC TCCTGAACCA ACTGCACAAC 

1051 ATTATCGGCA ACTGGGGCTG GGCAATCGTC GTTTTGACCA TCATCGTCAA 

1101 AGCCGTACTG TATCCATTGA CCAACGcctc ctACCGTTCG ATGGCGAAAA 

1151 TGCGTGccgc cgcacCcaaA CTGCAGACCA TCAAAGAAAA ATAcgGCGAC 

40 1201 GACCGTATGG CGCAACAGCA AGCGATGATG CAGCTTTACA AAgacgAGAA 

1251 AATCAACCCG CTGGGCGGC7 GTctgcctat gctgttgCAA ATCCCCGTCT 

1301 TCATCGGCTT GTACTGGGCA TTGTTCGCCT CCGTAGAATT GCGCCAGGCA 

1351 CCTTGGCTGG GCTGGATTAC CGACCTCAGC CGCGCCGACC CCTACTACAT 

1401 CCTGCCCATC ATT ATGGCGG CAACGATGTT CGCCCAAACC TATCTGAACC 

45 14 51 CGCCGCCGAC CGACCCGATG CAGGCGAAAA TGATGAAAAT CATGCCGTTG 

1501 GTTTTCTCCG TCATGTTCTT CTTCTTCCCT GCCGGTTTGG TTCTCTACTG 

1551 GGTGGTCAAC AACCTCCTGA CCATCGCCCA GCAGTGGCAC ATCAACCGCA 

1601 GCATCGAAAA ACAACGCGCC CAAGGCGAAG TCGTTTCCTA A 

This encodes a protein having amino acid sequence <SEQ ID 58; ORF1 lng-l>: 

50 1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 IDKVYTFTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPVVYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

55 251 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAIPTRGP 

301 KPKAZAVNLYA GPQTTSVIAN IADNLQLAKD YGKVHWFASP LFWLLNQLHN 

351 IIGNWGW AIV VLTIIVKAVL YPLT NASYRS MAKMRAAAPK LQTIKEKYGD 

401 DRMAQQQAMM QLYKDEKINP LGGCLP MLLQ IPVFIGLYWA LFA SVELRQA 

451 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 

60 501 VFSVMFFFFP AGLVLY WWN NLLTIAQQWH INRSIEKQRA QGEWS* 



ORFllng-1 and ORF1 1-1 shown 95.1% identity in 546 aa overlap: 
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orf llng-l .pep 
orfll-1 



orf llng-l .pep 
orfll-1 



10 20 30 40 50 60 

MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQKQAATASAEAALAPATPITVTT 
| | | U I I I I I I 1 I I I I I I I I I I I I I I I I i I I I I I I I I I : I I : 1 I II I I I I I I ! I U I I I I 
MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 50 60 

70 80 90 100 110 120 

DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 

| | I I | | II I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I M I I I I II I M I I I 1 I I 
DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf llng-l pep IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 
I | I I I I I I I I : t : I i I I I I I I I I I I I I M II I I I I I I I I I 11 I I I I I I I I I I I I I I I 
orfll-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 



20 



190 200 210 220 230 240 

orf llng-l. pep SADYRIVRDHSEPEGQGYFTHSYVGPVVYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 
I I I t II I I ! I I I t I I M I M II II II I I I I I I I M t I I I I I I I I! I I I M I I I I I I I I I I 
orfll-1 SADYRIVRDHSEPEGQGYFTHSYVGPVVYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 



25 250 260 270 280 290 300 

orf llng-l. pep PTGWLGMIEHHFMSTWILQPKGGQNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGP 

I I I I I I I I I II II I I I I I I I I I 1:111 1:1 I I I I I I I I I I I I : M I I I I : I I : I 
orfll-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQN-GA 

250 260 270 280 290 

30 

310 320 330 340 350 360 

orf llng-l . pep KPKMAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIV 
I : :: I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I : 
orfll-1 KAEASINLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAII 
35 300 310 320 330 340 350 



370 380 390 400 410 420 

orf llng-l. pep VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I M I I I I I I 
40 orfll-1 VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINP 

360 370 380 390 400 410 



430 440 450 460 470 480 

orf llng-l . pep LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQT 
45 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II I I I I I I I I I I I I I 

orfll-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQT 
420 430 440 450 460 470 



490 500 510 520 530 540 

50 orf llng-l . pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfll-1 YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVNNLLTIAQQWHINRSIEKQRA 
480 490 500 510 520 530 



55 

orf llng-l .pep QGEVVSX 
I I I I I I I 

orfll-1 QGEVVSX 
540 

60 In addition, ORF llng-l shows significant homology with an inner-membrane protein from the 



database (accession number p25754): 



ID 60IM_PSEPU STANDARD; PRT; 560 AA. 

AC P25754; 

DT 01-MAY-1992 (REL. 22, CREATED) 

65 DT 01-MAY-1992 (REL. 22, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 60 KD INNER-MEMBRANE PROTEIN. . . . 
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SCORES InitX: 1074 Initn: 1293 Opt: 1103 

Smith-Waterman score: 1406; 41.5% identity in 574 aa overlap 

10 20 30 40 

orf llng-1 . pep MDFKR LTAFFAIALVIMIGW EKMFPT PKPVPAPQQAAQKQ 

M: I I ::(::::!:::! : :ll I III :::|: : 

P 25754 MDIKRTILIAALAWSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 

10 20 30 40 50 60 



50 60 70 80 90 

orf llng-1. pep AATAS AEAALAPAT P IT VTTDTVQAVIDEKSGDLRRLTLLKYKATGDE-NKPF 

: : | : | I : : I : I : : | I I : : : : I I : I I : : I : I M I : I I I 

p25754 VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 
15 70 80 90 100 110 120 

100 110 120 130 140 

orf llng-1. pep VLFGDGKEYTYVAQSELLDAQGNNILKGIG FSAPKKQYTL-NGD TVEVRLSAPE 

I I : I I : I : I ! I I : : ! : : : I : : I : ! : I I : I : : I : : : : I 

20 p25754 QLFDNGGERVYLAQSGLTGTDGPDA-RASGRPLYAAEQKSYQLADGQEQLWDLKFS 

130 140 150 160 170 

150 160 170 180 190 200 

orf llng-1 . pep TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANLSADYRIVRDHS-EPEGQGYF-THSY 
25 | | : : t : : I : i : I ( : ) I I I I : I : : : I I I : I : : I : I 

p25754 DNGVN YIKRFS FKRGE Y DLNVS YLI DNQSGQAWNGNMFAQLKRDASGDPS S STATGTATY 

180 190 200 210 220 230 

210 220 230 240 250 260 

30 orf llng-1 .pep VGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 

:!:::! : : I I I : : I : I I : : : I : : I I : : : : I : I : : : M I : 

p257 54 LGAALWT AS E P YKKVSMKD I D KGSLKE NVS GGWVAWLQHY FVT AW I - P AKS D 

240 250 260 270 280 

35 270 280 290 300 310 320 

orf llng-1 . pep QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGPKPKMAVNLYAGPQTTSVIANIAD 
: I 1 :::::: I : : I : : : I : i I : : : I I I M : I : : : : 

p25754 NNV VQTRKDSQGNYIIGYTGPVISVPA-GGKVETSALLYAGPKIQSKLKELSP 

290 300 310 320 330 

40 

330 340 350 360 370 380 

orf llng-1. pep NLQLAKDYGKVHWF- AS PLFWLLNQLHNI IGNWGWAI WLT 1 1 VKAVLYPLTNAS YRSMA 
: I : I : III : II I : I : II I I : : : I : : : I I I II : I : I I I : : : I : : : : II : Mil, 1 !,' 
p257 54 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 
45 340 350 360 370 380 390 

390 400 410 420 430 440 

orf llng-1 . pep KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 
: I I I : I I I I : : I I : : I I | I : : : I I 1 I : I I I I I I I ! I ! I I I I : I : I : I II : : I I I : I : 
50 p25754 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVQMPVFLALYWVLL 

400 410 420 430 440 450 

450 460 470 480 490 500 

orf llng-1 .pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVF 
55 M i : I i II I : I I II I I I I : : I (I I f i : I 1 I t I I I I I I I I II I : I I : I I : : I 

p25754 ESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 490 500 510 

510 520 530 540 

60 orf llng-1 .pep S VM F FF F PAG L V L Y W V VNN L LT I AQQ W H I NR S I E KQRAQG E W S X 

: : I :: I I I I I I I I I I I I I I : I : I I I : I : I II 
p 2 5 7 5 4 T FFFLW FPAGLVLYWWNNCLS I SQQWY ITRRIE AATKKAAA 

520 530 540 550 560 

Based on this analysis, including the homology to an inner-membrane protein from P. putida and 
65 the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
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is predicted that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 8 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 59>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

2 01 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

251 ACCGTTACGA AGTT.TTTAT CGCGGTACG . ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ED 60; ORF13>: 

1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ID 61>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

251 ACCGTTACGA AGTTTTtTAT CGCGGTACGc ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; ORF13-l>: 

1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVfY RGTHWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from N. meningitidis (strain A) 

ORF13 shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

orf 13. pep AVL 1 1 ELLT GTVYLLWSAALAGS G I AYGLT GST PAAVLTXA LLS ALG IX F 

II I I I I I I I I 1 I I I I I I I I I I I I I I I ! I I I 1 I I I 1 I I I I I I I I I II I I I 
orfl3a MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAA LLSALGIWF 
10 20 30 40 50 60 

60 70 80 90 100 110 

orf 13. pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 
I I i I f I t I II t I I I i I I f 1 I I I : i I I II : I I I I I I I Nil I I I I I j i I I I I I M I I I 
orf 13a VHAKT AVGK VET D S YQ D LD AGQ YAE I LRH AG GNR YE V F YRGT H W QAQN T G QEE LE P GT RA 

70 80 90 100 110 120 

120 

or f 1 3 . pep LIVRKEGNLLI ITHPX 
I i I I I I I I I I I I :: I I 
o r f 1 3 a L I VRKEGNLLI IAKPX 

130 
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The complete length ORF13a nucleotide sequence <SEQ ID 63> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

4 01 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 64>: 



1 MTVWFVAAVA VLIIELLTGT VYLLVVSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VfiAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 
101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 

ORF13a and ORF13-1 show 94.4% identity in 126 aa overlap 



10 20 30 40 50 60 

orfl3a.pep MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

I I I I I I i i I I I I I I I I II I I t I i ( I II I I i i i t i i i I I I t I I I I I i M i 
orf 13-1 AVLIIELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

10 20 30 40 50 



70 80 90 100 110 120 

or f 13a . pep VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELE PGTRA 

I i I I I I I II I M I I I I I I I I I I : I I I I I : I I I I I I I t I I I t t I I II 1 I t I I I I I I I I I I 
orf 13-1 VHAKTAVRKVET DS YQDLDAGQYVE I LRHTGGNRYE VFYRGTHWQAQNTGQEELE PGTRA 

60 70 80 90 100 110 



130 

orf 13a . pep LI VRKEGNLLI IAKPX 

I I I 1 I I I I I I I I : : 11 
orf 13-1 LIVRKEGNLLIITHPX 
120 



Homology with a predicted ORF from N. gonorrhoeae 

ORF13 shows 89.7% identity over a 126aa overlap with a predicted ORF (ORF13.ng) from N. 
gonorrhoeae: 



orf 13 AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 51 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I M I I I I 

orf 13ng MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 60 

orf 13 VHAKTAVRKVET DS YQDLDAGQYVE I LRHTGGNRYEVXYRGTXWQAQNTGQEELE PGTRA 111 

i I i I I I I II II I I I I f I I : I : I : I I i I : I I I I I I II I II I MINIMI MINI) 

orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 120 



orf 13 LIVRKEGNLLIITHP 12 6 

II M I M I I I M : : I 
orfl3ng LIVRKEGNLLIIANP 135 

The complete length ORF13ng nucleotide sequence <SEQ ID 65> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCCTA CGGGCTGACT GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCACTGCTTT CCGCGCTGGG CATTTGGTTC GTACATGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATACC GGAAAATATG 

251 CCGAAATCCT CCGATACACA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCGCA AAATACGGGG CAGGAAGTGT TTGAACCGGG 

351 AACGCGCGCC CTCATCGTCC GCAAAGAAGG TAACCTTCTT ATCATCGCAA 

4 01 ACCCTTAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 



1 MTV WFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDT GKYAEILRYT GGNRYEVFYR 
101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 



ORF13ng shows 91.3% identity in 126 aa overlap with ORF13-1: 



orf 13-1 -pep 



10 20 30 40 50 

AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 




orf 13ng 



MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 
10 20 30 40 50 60 



orf 13-1 .pep 



60 70 80 90 100 110 

VHAKTAVRKVETDS YQDLDAGQYVE I LRHTGGNRYEVFYRGTHWQAQNTGQEELE PGTRA 



orf 13ng 




70 80 90 100 110 120 



orf 13-1. pep 



120 

LIVRKEGNLLI ITHPX 



orf 13ng 




130 



Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
ORF 13 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the proteins 
from K meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines 
or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N. meningitidis <SEQ ID 67>: 



1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

2 01 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

4 01 ATGCCGTC 



Further work revealed the complete nucleotide sequence <SEQ ID 69>: 



1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 



This corresponds to the amino acid sequence <SEQ ID 68; ORF2>: 



1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 
51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 
101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 
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4 51 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 

601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ID 70; ORF2-l>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS* 

Further work identified the corresponding gene in strain A of 'N. meningitidis <SEQ ID 71 >: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

4 51 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACC GCTGTTGAAA CGCCTGTTCC GCATACCACT 

601 TCGCTGCGTA AACAGGCAAT AAGCCGCAAA CGCGATTTGC GTCCTAAATC 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This encodes a protein having amino acid sequence <SEQ ID 72; ORF2a>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a 118aa 
overlap with ORF2a: 

10 20 30 40 50 60 

orf 2 . pep MXD FGLGELVFVGIIALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 
I ! I I I ! I I I I II I I I I I I I I I II I I I M : I I I ! I II I I I I I I I I I I I I I I I I I I I I I I 
orf 2a MFD FGLGELVFVGIIALIVL GPERLFEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 2 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

I I I I I I I I II I I I I I II I I I I I I II I II 1 I 1 I I I II I I I I II I I I II II I I I I I I ! I I 
orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 

70 80 90 100 110 120 

130 

orf 2 . pep RCGKHP I RRH FRR YAV 

orf 2 a DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAE PAETDQDRAWREYLTASAAAPVV 

130 140 150 160 170 180 

The complete strain B sequence (ORF2-1) and ORF2a show 98.2% identity in 228 aa overlap: 

orf 2a . pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

11 I II I II I I II I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I II I M I I I I I I I I I I I 
orf 2-1 MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 



orf 2a . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I I II : I 
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or f 2-1 KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

orf2a oep DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

Mltll MMMINIIIIllIllllliMiiii ililllllltllMllltltttltt 
or f2-l DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPVV 180 

orf2a pep QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 22 9 

| | | i | | I [ I I I II I I I II I f I 1 I I I I I II I t I: I I I I I I I I II I I I I I 
orf2-l QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 229 

Further work identified a partial DNA sequence <SEQ ID 73> in N.gonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 
15 101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGATT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTT GGTCCAGAAC GCCTGCCCGA AGCCGCCCGC ACTGCCGGAC 

101 GGCTTATCGG CAGGCTGCAA CGCTTTGTAG GAAGCGTCAA ACAAGAACTT 

20 151 GACACTCAAA TCGAACTGGA AGAGCTGAGG AAGGTCAAGC AGGCATTCGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GATACGGATA 

251 TGCAGAACAG TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCc tgccgatttc gGTGTCGATg AAAacggcaa 

351 tccccttccc gATACGGCAA ACACCGTATC AGACGGCATT TCCGACGTTA 

25 4 01 TGCCGTCTGA ACGTTCCGAT ACTtCCgcCG AAACCCTTGG GGACGACAGG 

4 51 CAAACCGGCA GTACAGCCGA ACCTGCGGAA ACCGACAAAG ACCGCGCATG 

501 GCGGGAATAC CTGactgctt ctgccgccgc acctgtcgta Cagagggccg 

551 tcgaagtcag ctaTATCGAT ACTGCTGTTG AAacgcctgT tccgcaCacc 

601 acttccctgc gcaAACAGGC AATAAACCGC AAACGCGATT TttgtccgaA 

30 651 ACACCGCGCc aAACCGAAat tgcgcgtcCG TAAATCATAA 

This encodes a protein having the amino acid sequence <SEQ ID 76; ORF2ng-l>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

35 151 QTGSTAEPAE TDKDRAWREY LTASAAAPVV QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 87.5% identity over a 136aa 
overlap with ORF2ng: 

orf 2 .pep MXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

40 I I I I II I I : I I I I II I I II I I I I I I I I I : I I I I I I I M I I I I I II I I : I I M I I I I I I 

orf2ng MFDFGLGELI FVGI I ALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

orf 2 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

I : I I I I I I I I I I I I I I I M I I I ::: I I I I I I I II I I I I I I I M i I I I I i II i : I I 
45 orf2ng KVKQAFE AAAAQVRD S LKET DT DMQNS LHD I S DGLK P WE KLPEQRT PAD FG VDERGN S L P 120 

orf2.pep RCGKHPIRRHFRRYAV 136 

I III I II I I II I I I 
orf2ng RYGKHRIRRHFRRYAV 136 

50 The complete strain B and gonococcal sequences (ORF2-1 & ORF2ng-l) show 91.7% identity in 
229 aa overlap: 

10 20 30 40 50 60 

orf 2-1. pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 
I II I I M I I : I I I I I I I I I I II I I It I I I I I I I I II I I I I I I I I I I I I I : I I I II I I I I I 
55 orf2ng-l MFDFGLGELI FVGI IALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 
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10 20 30 40 50 60 

70 80 90 100 110 120 

KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 

|:|| | | 1 | 1 I I I I I ! I 1 I I Mf:::iimiMllitmtl)!ll!tM!IM!l!] 
KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 
70 80 90 100 110 120 

130 140 150 160 170 180 

DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 

t : I I I : I I 1 I 1 I I I 1 ! I I I : M I I I M : i I I I i I 1 i I I I I : t f I II I I I I I I I I I I I i 
DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 
130 140 150 160 170 180 

190 200 210 220 229 

Q-TVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 



190 200 210 220 230 

20 Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 
of Exoli: 

gnllPID|el292181 (AJ005830) TatB protein [Escherichia coli] Length = 171 
Score = 56.6 bits (134), Expect = le-07 
25 Identities - 30/88 (34%), Positives = 52/88 (59%), Gaps = 1/88 (1%) 

Query: 1 MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL +++L+E + 
Sbjct: 1 MFDIGFSELLLVFIIGLVVLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 

30 

Query: 61 -KVKQAFEAAAAQVRDSLKETDTDMQNS 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 DS LKKVEKAS LTNLT PELKASMDELRQA 88 

Based on this analysis, it was predicted that ORF2, ORF2a and ORF2ng are likely to be membrane 
35 proteins and so the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 3 A 
shows the results of affinity purification of the GST-fusion protein, and Figure 3B shows the results 
40 of expression of the His-fusion in E.coli, Purified GST-fusion protein was used to immunise mice, 
whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a useful immunogen. 



Example 10 

45 The following partial DNA sequence was identified in N. meningitidis <SEQ ED 77>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 
51 CGC . TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 



orf2-l.pep 

5 

orf2ng-l 

10 orf2-l.pep 
orf2ng-l 



15 



orf2-l.pep 
orf2ng-l 
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101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

251 ATTGATGCAC JcGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

401 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

451 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 

501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 

551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 

601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG - - 

This corresponds to the amino acid sequence <SEQ ID 78; ORF15>: 

1 MQARLLIPIL FSVFILSACG TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEM. . 

Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG AC ACT G AC AG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GC AG TAG AC A 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAAT T AAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AG T CAT GAG G GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 

951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ ID 80; ORF15-l>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A ofN. meningitidis <SEQ ID 81>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC . 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 



CHIR-0160 (356.001) 



-111- 



PATENT 



701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 82; ORF15a>: 

1 MOA RLLIPIL F5VFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDVVSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHMG NSAPSVEADN 

301 SHEGYGYSDE AVRRHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 98.1% identity over a 2 Baa 
overlap with ORF 15a: 

10 20 30 40 50 60 

orflS pep MQARLLIPILFSVFILSA CGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
j I | I I j II f I I I I I I i I i I I I I I I I i I I I I I t I I I I I I t t 1 I 1 I I I I I I I I I I t I I t I t 
orf 15a MQARLLIPILFSVFILSA CGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 15 pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I II I I I 1 i I I I I I I I I I II I I I I I I I I I I I I I i I M I I I I I I I I i M 1 I I I I M I 
orfl5a KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
70 80 90 100 110 120 

130 140 150 160 170 180 

orfl5.pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I 11 I t I 1 I I I t I I I I I I I I M I I I I I I I I 11 M I I I I I I I I I I 1 I I I I I I I I I 
orfl5a LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 



190 200 210 

orf 15 .pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEM 
1 I I I I I I I M I I I I I I I I I I 1 I I I I I I I I M I I 
orf 15a FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and ORF15a show 98.8% identity in 320 aa overlap: 



10 20 30 40 50 60 

orf 15a. pep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I I ! I I I I ! I I 1 I I II I I I I II I II I 11 M I 1 I 1 I I It I 11 I 1 1 I I t I I 1 I 1 I I I 1 I 1 
orf 15-1 MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 15a. pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I I I I I II I I I I I I I 1 I I i I I I I I I I I I I I I I 1 I 1 I I I I I I 1 I I I I I t I 1 I I I I I I 
orf 15-1 KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 15a. pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I II II I II I II I II I II I I I I i I I I I I II II I I I I II I I i I II I I I I i I I I I I I I I 
orf 15-1 LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 15a . pep FLRGIDVVSPAMADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
I I 1 I I I I I II M I I I I I I I I I I I I I I I I I I I I I II II I I I ! I I I I I I I II I I I I I I I I I I 
orf 15-1 FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
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190 200 210 220 230 240 

250 260 270 280 290 300 

IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 

| ( | | | | | || I | | I I t I ! I I 1 I I I I I t I t I I I I I t I I t I I I I I : t I I I I I M I I I I I I I I 
IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 



310 320 
orf 15a . pep SHEGYGYSDEAVRRHRQGQPX 
I I II I I I I I I : i I : 1 1 I I I M 
o r f 1 5 - 1 SHEGYGYS DE WRQHRQGQPX 

310 320 

Further work identified the corresponding gene in N. gonorrhoeae <SEQ ID 83>: 



1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGCAAACGCT 

101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

301 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATCAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 84; ORF15ng>: 



1 MRARLLIPIL FSVFILSACG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (ORF 15) shows 97.2% identity over a 2 Baa 
overlap with ORFlSng: 



orf 15 . pep MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 60 

I : II I I I I I I I I I I ! 1 I I I I I ( I I I I I i i I I I f I I I I I I I I I I I ! I I I 11 f i II I I I I I 

orflSng MRARLLI PILFSVFILSACGTLTGI PSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 60 

orf 15. pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

I II I I 1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I II I I I I f I I I I 1 i I I I I I I I 

orflSng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 



orf 15. pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

I 1 I I I II I I I I I I ( 1 1 I I M I II : I I 1 I I I I I I I I I I I I I I II I I I I I I I I I 1 I i I I I I I 

orflSng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

orf 15. pep FLRGIDVVSPANADT DVFINIDVFGTIRNRTEM 213 

I I I I I I I I I I I I I t I I I I I 11 I I I I II I I I I I I 

orflSng FLRGI DWSPANADT DVFIN I DVFGT I RNRTEMHLYNAETLKAQTKLE YFAVDRTNKKLL 24 0 



The complete strain B sequence (ORF15-1) and ORFlSng show 98.8% identity in 320 aa overlap: 
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10 20 30 40 50 60 

orf 15-1 . pep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I : M I M I I I I I ! I I t I I I I I I I I I I I I I I I I I I I I II ) I I I I I 1 I I I I I I I ! 1 I I I I II 
orf 15ng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 15-1. pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I I I I I I I 1 I I I I I I I I I I I II I I I i I I I I I II I I I I II I ! I If I I I I I II I I I I I 
10 orfl5ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15-1. pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
15 i II I II I M I I I I I I 1 I M I I M : I I I I i I I I I I I I I I I I i I I I I I t I I I I I I i i I I I It 

orflSng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

20 orf 15-1 . pep FLRG I DWS PANADTDVFINI DVFGT IRNRTEMHLYNAETLKAQTKLE YFAVDRTNKKLL 

I I I I I I I I I I I I I I t t I t I I I I I t I I I I I I I I I I I I I I t t I I I t I I I I I I I I I I i I i I I I 
orflSng FLRGI DWS PANADTDVFINI DVFGT IRNRTEMHLYNAETLKAQTKLE YFAVDRTNKKLL 

190 200 210 220 230 240 

25 250 260 270 280 290 300 

orf 15-1 . pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 
I M I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I : I I I I I II II I I I II I I I 
orfl5ng IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 

30 

310 320 
orf 15-1. pep SHEGYGYSDEWRQHRQGQPX 
I I ! II I I I I I : I I I I I I I I I I 
orflSng SHEGYGYSDEAVRQHRQGQPX 
35 310 320 

Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
40 raising antibodies. 

ORF15-1 (31.7kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fusion protein, and Figure 4B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
45 mice, whose sera were used for Western blot (Figure 4C) and ELISA (positive result). These 
experiments confirm that ORFX-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 11 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 85>: 



50 



1 . . GG . CAGCACA AAAAACAGGC GGTTGAACGG AAAAACCGTA TTTACGATGA 
51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 
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101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

151 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

5 301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

401 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

451 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC.TT CGGCATTATG 

10 551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 

This corresponds to the amino acid sequence <SEQ ID 86; ORF17>: 

1 ..GQHKKQAVNG KTVFTMMPGM IFGVFTGAFS AKYIPAFGLQ IFFILFLTAV 

51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 

101 CGFPAHKAIG TSSGLAWPIA LSGAISYLLN GLNIAGLPEG SLGFLYLPAV 

15 151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

Further work revealed the complete nucleotide sequence <SEQ ID 87>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

20 151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

25 401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

30 651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 Tc.TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; ORF17-l>: 

35 1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPVVLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMIFGVFTGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTP PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

40 251 X FGIMLLLIA GKMLYNLL * 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical H.infl uenzae transmembrane protein HI0902 (accession number P44070) 
ORF17 and HI0902 proteins show 28% aa identity in 192 aa overlap: 

HKKQAVNGKTVFTMMPGMIFGVFT-GAFSAKYIPAFGLQIF — FILFLTAVAFKTLHTDP 59 
45 HK + + V + P ++ VF G F + +IF +++L ++ D 

HKLGNIVWQAVRILAPVIMLSVFICGLFIGRLDREISAKIFACLWYLATKMVLSIKKD- 130 



50 



55 



ORF17 


3 


HI0902 


72 


ORF17 


60 


HI0902 


131 


ORF17 


120 


HI0902 


190 


ORF17 


180 


HI0902 


250 



QTASRPLPGLPXLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPI 119 
Q ++ L L + h G SS GIGGG VPFL G +AIG+S+ + 



+SG S++++G +PE SLG++YLPAV ++A + + LG 



F + L+++A M 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) from strain A of N. 



meningitidis: 



orf 17 .pep 
orfl7a 

orf 17 . pep 
orfl7a 

orf 17 . pep 
orfl7a 

orf 17 . pep 
orfl7a 



10 20 30 

GQHKKQAVNGKT VFTMMPGMI FGVFTGA FS 

I I I I I I I I : I I I 1 I I I I I f : II I I : I I : I 
OG LAOHPYAOHLA VGTSFAVMVFTAFSSML GQHKKQAVDWKT VFTMMPGMVFGVFAGA LS 
50 60 70 80 90 100 

40 50 60 70 80 90 

AKYIP AFGLQIFFILFLTAVAF KTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 

I I [ I | i I | U j M I I I M t I I i I II t f I II I I I If I M M I I I I I I I I I I I I I I I I I I I 
AKYIP AFGLQIFFILFLTAVAF KTLHTDPQTA5RPLPGLPGLTAVSTLFGTMSSWVGIGG 
110 120 130 140 150 160 

100 110 120 130 140 150 

GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 

I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I M I I It I I I I I I I M I I I I I I 
GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 

160 170 180 190 

AVLSAAT I AFAPLGV KTAHKL S SAKLKKS FGIMLLLIAGKMLYNLL X 
M I I I I I M I I I I I M I II I I I M I I I I I I II I I I I I I I I M I I M I 
AVL SAAT I AFAPLGVKTAHKL S SAKLKKS FGIMLLLIAGKMLYNLL X 

260 



230 



240 



250 



The complete length ORF 17a nucleotide sequence <SEQ ID 89> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGTGGCATT 
AGGTTTTATT 
CTGTCGTTTT 
GCGCAACACC 
CTTTTCCAGT 
CCGTATTTAC 
CTCTCCGCAA 
GTTTTTAACC 
CATCCCGCCC 
TTCGGCACAA 
CCCCTTCTTA 
CATCCGGCCT 
CTCAACGGCC 
TTACCTGCCC 
CGCTCGGTGT 
TCCTTCGGCA 
GCTTTAA 



GGGACATTAT 
GCCGGCCTGT 
ATGGGTGCTT 
TCGCCGTCGG 
ATGCTGGGGC 
GATGATGCCG 
AATATATCCC 
GCCGTCGCAT 
GCTGCCCGGA 
TGTCGAGCTG 
ATCCACTGCG 
TGCCTGGCCG 
TGAATATTGC 
GCCGTCGCCG 
CAAAACCGCC 
TTATGTTGCT 



CTTAATCCTG 
TCGGCGTAGG 
GATTTGCAGG 
CACATCCTTC 
AGCACAAAAA 
GGTATGGTAT 
AGCGTTCGGG 
TCAAAACACT 
CTGCCCGGAC 
GGTCGGCATA 
GCTTCCCCGC 
ATTGCACTCT 
AGGATTGCCC 
TCCTCAGCGC 
CACAAACTTT 
TTTGATTGCC 



CTTGCCGTAG 
CGGCGGCACG 
GTTTGGCACA 
GCCGTCATGG 
ACAGGCGGTC 
TCGGCGTATT 
CTTCAAATTT 
GCATACCGAC 
TGACTGCGGT 
GGCGGCGGTT 
CCATAAAGCC 
CCGGCGCAAT 
GAAGGGTCAC 
GGCAACCATT 
CTTCTGCCAA 
GGAAAAATGC 



GCAGTGCGGC 
CTGATTGTCC 
ACATCCTTAC 
TCTTCACCGC 
GACTGGAAAA 
CGCTGGCGCA 
TCTTCATCCT 
CCTCAGACGG 
TTCCACACTG 
CACTTTCCGT 
ATCGGCACAT 
ATCGTATCTG 
TGGGCTTCCT 
GCCTTTGCCC 
ACTCAAAAAA 
TGTACAACCT 



This encodes a protein having amino acid sequence <SEQ ID 90>: 



1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMVFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 S FGIMLLLIA GKMLYNLL * 

ORF17a and ORF17-1 show 98.9% identity in 268 aa overlap: 



10 20 30 40 50 60 

orf 17a. pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
I I I II I I I I I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I I I I I I I I I I II M I I I I I 
orf 17-1 MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 17a . pep AVMVFT AFS SMLGQHKKQAVDWKT VFTMMPGMV FG VFAGAL S AK Y I PAFGLQ IFF I LFLT 
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| | | j ( M ( i I I [ I ( 1 i t I 1 t I I I I t I t t I I M : I ! I I : I I I I I M I I I I I I I I I I I I I M 
orf 17-1 A VMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17a pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
t I I I I I I I I I t I I I t I I I I I I ! ! I I M I I I 1 I I I N I 1 I I ! I I I I M I M M I I I I i I M 
orf 17-1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
130 140 150 160 170 • 180 

190 200 210 220 230 240 

orf 17a pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
| | | I I 1 I I I I I I I I I I II I I 1 I 1 11 i I I i I I I i i I I I I M I I i I I I I I I I i i I M I I i i i 
orf 17-1 IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



250 260 269 

orf 17a . pep HKLSSAKLKKSFGIMLLLIAGKMLYNLLX 
I 1 I I I I I I I I I II ! I i I I I I i I I I I I II 
orf 17-1 HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 

250 260 



Homology with a predicted ORF from N. gonorrhoeae 

ORF17 shows 93.9% identity over a 196aa overlap with a predicted ORF (ORF17.ng) from N. 
gonorrhoeae: 

orf 17 . pep GQHKKQAVNGKTVFTMMPGMI FGVFTGAFS 30 

I I I I I I I I : I I : I : I 1 I I I I I I I 1 : II : I 
orfl7ng QGLAQHPYAQHLAVGT S FAVMVFTAFS SMLGQHKKQAVDWKT I FAMMPGMI FGVFAGALS 102 



orf 17 .pep 
orf 17ng 
orf 17 . pep 
orf 17ng 
orf 17 .pep 
orf 17ng 



AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 
! if I I II I II I I I t I t i I I I I I I I i I I I I I I I I I I I I I I i I I I It M : I I i II I I I I 
AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 

GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 
I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I 
GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 

AVLSAATIAFAPLGVKTAHKLSSAKLKKSFGIMLLLIAGKMLYNLL 196 
I I I I I I I I I I I i 1 I i I I I I I II I II I I : t I I I I II II ( I I I I II I I 
AVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKMLYNLL 268 



An ORF17ng nucleotide sequence <SEQ ID 91> is predicted to encode a protein having amino acid 



sequence <SEQ ID 92>: 



1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLSSAKLKE 

251 SFGIMLLLIA GKMLYNLL* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 



1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CAcaTccttc gcCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

4 01 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 

4 51 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 
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551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

7 01 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAGAA 

7 51 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT IFAMMP GMIFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKE 

251 S FGIMLLLIA GKMLYNLL * 

ORF17ng-l and ORF 17-1 show 96.6% identity in 268 aa overlap: 

10 20 30 40 50 60 

MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

I I I I M I M I I I i I M I I I f I I M I 1 I t I ft I i ! i t I ! I I I f i I t i I I t I I I i I I I i I I I 
MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
10 20 30 40 50 60 



orf 17-1 .pep 
orf 17ng-l 



70 80 90 100 110 120 

orf 17-1 . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 
I I M I I I I I 1 I I I 11 I I I ! I I I I I : I : I I I I 1 I I I I I :! I I 1 I I I i I I I i I I I M I ! I I I 
orfl7ng-l AVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVFAGALSAKYIPAFGLQIFFILFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17-1. pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I I I I I I 1 I I I I I I I I I I I I I I I I I I I II I I : I 111 I ) 1 I I I I I I M I I I I I I I f I i I I 
orfl7ng-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17-1 . pep IGTSSGLAWFIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
I I 1 I I I I I I I I I I I I I I I I I : I I I II I I I I I I I 1 I I I I I I I I I I II I II I II I I I I I I M 
orfl7ng-l IGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



250 260 269 

orf 17-1 . pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 
I I II II I I 1 : I I I I I M I I I I I I I I I II 
orf 17ng-l HKLSSAKLKESFGIMLLLIAGKMLYNLLX 

250 260 



In addition, ORF17ng-l shows significant homology with a hypothetical H.influenzae protein: 

sp|P44070|Y902_HAEIN HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
HI0902 - Haemophilus influenzae (strain Rd KW20) gi | 1573922 (U32772) H. influenzae 
predicted coding region HI0902 [Haemophilus influenzae] Length = 264 

Score = 74 {34.9 bits), Expect = 1.6e-23, Sum P{2) = 1.6e-23 

Identities = 15/43 (34%), Positives - 23/43 (53%) 



Query: 55 AVGT S FAVMVFT AFS SMLGQHKKQAVDWKT I FAMM PGM I FG V F 97 

A+GTSFA +V T S HK + W+ + + P ++ VF 
Sbjct: 52 ALGTSFATIVITGIGSAQRHHKLGNIVWQAVRILAPVIMLSVF 94 

Score = 195 (91.9 bits), Expect = 1.6e-23, Sum P(2) = 1.6e~23 
Identities * 44/114 (38%), Positives = 65/114 (57%) 

Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 

L G SS G1GGG VPFL G +AIG+S+ + +SG S-M-V+G + 

Sbjct: 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

Query: 210 PEGSLGFLYLPAVAVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKM 263 
PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 
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Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 

This analysis, including the homology with the hypothetical Kinfluenzae transmembrane protein, 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 12 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 95>: 

1 . . GGAAACGGAT GGCAGGCAGA CCCCGAACAT CCGCTGCTCG GGCTTTTTGC 

51 CGTCAGTAAT GTATCGATGA CGCTTGCTTT TGTCGGAATA TGTGCGTTGG 

101 TGCATTATTG CTTTTCGGGA ACGGTTCAAG TGTTTGTGTT TGCGGCACTG 

151 CTCAAACTTT ATGCGCTGAA GCCGGTTTAT TGGTTCGTGT TGCAGTTTGT 

201 GCTGATGGCG GTTGCCTATG TCCACCGCTG CGGTATAGAC CGGCAGCCGC 

251 CGTCAACGTT CGGCGGCTCG CAGCTGCGAC TCGGCGGGTT GACGGCAGCG 

301 TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 

351 A 

This corresponds to the amino acid sequence <SEQ ID 96; ORF18>: 

1 . . GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 
51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 
101 LMQVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGC AG CAT AT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

2 51 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This corresponds to the amino acid sequence <SEQ ID 98; ORF18-l>: 



1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRAAP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LK FVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS EIG 
201 R* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N .meningitidis (strain A) 

ORF18 shows 98.3% identity over a 116aa overlap with an ORF (ORF18a) from strain A of N. 

meningitidis: 

10 20 30 

orf 18 .pep GNGWQADPEHPLLGLFA VSNVSMTLAFVGI 

M I I I I I I I II I I I I I I I I I I I I I I I I I I I 
or f 1 8 a TRAAP LFI FHFYLTLG5 1 FFFI GHWNRKTDGNGWQADPEHPLLGLF AVSNVSMTLAFVGI 

60 70 80 90 100 110 
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40 50 60 70 80 90 

rAT■VHYCFSGTVOVFVFAALLKLYALK PVYWFVLQFVL^4AVAYV HRCGIDRQPPSTFGGS 

TTTTt mm ii ii i n 1 1 1 n n n n i m 1 1 n m i n i n 1 1 1 n i m ! 1 1 1 i m 

CALVH Y C FSXT VQVFV FAALLKL YALK FV YW FVLQFVLMAVAYV HRCG I DRQ P P ST FGG S 
— 120 130 140 150 160 170 

100 110 

0 LRLG G LT AALMQV S VLVLLL S EIGRX 

1 I I I I I I I I I I II II I I I M I I I I I I 
QLRLG GLTAALMQXSVLVLLLS E I GRX 

180 190 200 

The complete length ORF18a nucleotide sequence <SEQ ID 99> is: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

4 51 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This encodes a protein having amino acid sequence <SEQ ID 100>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRAA P LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 FA VSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA LK PVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQXS VLVLLLS EIG 
201 R* 

ORF18a and ORF 18-1 show 99.0% identity in 201 aa overlap: 

10 20 30 40 50 60 

MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
I i I M I I 1 I M I M 1 I t 1 ( I I M I I ( II t M t M I I i i M I II ! II I I t I I I I I I t 1 I 1 I 
MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
10 20 30 40 50 60 

70 80 90 100 110 120 

LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
I I I II I I i I I II M I I i I I I I I i I i I I I I I I 1 I I I I I I I I I I I I I I I I i I I I I I I I I I I I 
LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
70 80 90 100 110 120 

130 140 150 160 170 180 

YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
I I I I I I II I I I I II II I I II I I M I I II I I I I I I I I I I I I I I I I I I I I II I I II I I I I I 
YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
130 140 150 160 170 180 

190 200 
GLTAALMQXSVLVLLLSEIGRX 
I I I I I I I I I I I I I I I I I I I I I 
GLTAALMQVSVLVLLLSEIGRX 
190 200 

Homology with a predicted QRF from N. gonorrhoeae 

ORF18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORF18.ng) from N. 
gonorrhoeae: 



orf 18 .pep 
orf 18a 

orf 18 .pep 
orf 18a 



orf 18a. pep 
orfl8-l 

orf 18a .pep 
orfl8-l 

orf 18a . pep 
orfl8-l 

orf 18a .pep 
orfl8-l 
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GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 

I I I I I I ! I I I I I t 1 1 I I I I M I I I I I 1 I M 
TRAAPLFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 



10 



orfl8.pep 
orfl8ng 
orf 18 .pep 
orf 18ng 
orf 18 .pep 
orf 18ng 

The complete length ORFlSng nucleotide sequence is <SEQ ID 101>: 



CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 
I I | | I | | | M I I 1 I i 1 M I I I I I I I I I I I 1 I I I It I I t I i I I I I i I I I I M I t I I I I 1 I 1 
CALVHYC FS GT VQV FV FAALLKLY ALKPVYW FVLQFVLMAVAYVHRCG I DRQ P PST FGGS 



30 



115 



90 



175 



QLRLGG LT AALMQVS VL VLLL SEIGR 
1 I M I |:| 1111:1 : : I I : I I I I 
QLRLGVLAAMLMQVAVTAMLLAEIGR 



116 



201 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGATTTTGC 
tttTctgTTT 
GTATTGCGTT 
GGGATGTGGG 
CCTGACTTTG 
CAGATGGAAA 
TTTGCCGTCA 
GTTGGTGCAT 
CATTGCTCAA 
TTTGTATTGA 
GCCGCCGTCA 
CGATGTTGAT 
AG AT G A 



TGCATTTGGA 
CTGATATTCC 
GTGGCTCGGC 
GAATGACCCG 
GGCAGCATAT 
CGGATGGCAG 
GTAATGTATC 
TATTGCTTTT 
ACTTTATGCG 
TGGCGGttgC 
ACGTTCGGCG 
GCAGGTTGCG 



TTTTTTGTCT 
GCGCAGGAAT 
ATCTCGGTTT 
CGCCGCGCCT 
TTTTTTTCAT 
GCAGACCCCG 
GATGACGCTT 
CGGGAACGGT 
CTGAAGCCGG 
CTATGTCCAC 
GTTCGCAGCT 
GTAACGGCGA 



GCCTTACTGt 
GTTGCAATGG 
TAGGGGTAAA 
TTGTTCATCC 
CGGGTATTGG 
AACATCCGCT 
GCTTTTGTCG 
TCAAGTGTTT 
TTTATTGGTT 
CGCTGCGGTA 
GCGACTCGGC 
TGCTGCTTGC 



aTGCGGcggt 
TTTTGGGCGA 
GCTGATGCCG 
CCCATTTTTA 
AACCGGAAAA 
GCTCGGGCTT 
GAATATGTGC 
GTGTTTGCGG 
CGTGTTGCAG 
TAGACCGGCA 
GTGTTGGCGG 
CGAAATCGGC 



This encodes a protein having amino acid sequence <SEQ ID 102>: 



30 



l 

51 
101 
151 
201 



MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIALWLG ISVLGVKLMP 

GMWGMTRAA P LFIPHFYLTL GSIFFFI GYW NRKTDGNGWQ ADPEHPLLGL 

FAV SNVSMTL AFVGICALV H Y CFSGTVQVF V FAALLKL Y A LKP VYWFVLQ 

FVLMAVAYVH RCGIDRQPPS TFGGSQLRLG VLAAMLMQVA VTAMLLA EIG 



35 



40 



45 



50 



55 



This ORF18ng protein sequence shows 94.0% identity in 201 aa overlap with ORF 18-1: 

10 20 30 40 50 60 

orf 18-1 . pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
I I I ) I I 1 M 11 I I ! M I I I I I M 1 II I I I I I ! I I I I 1 I I I I ! I I : I M 1 I : I 11 I I II 1 
orf 18ng MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIALWLGISVLGVKLMPGMWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 18-1 . pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
I i I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I II i II I I I I I I I I I I I I I i II I II 
orfl8ng LFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 18-1 . pep YCFSGTVQVFVFAALLKLYALKPVYW FVLQFVLMAVAYVHRCG I DRQPP ST FGGSQLRLG 
I I 1 I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I M I i 
orfl8ng YC FS GT VQV FV FAAL LKL YALK P V Y W FVLQ FVLMAVAY VHRCG I DRQ P P S T FGG S QLRLG 

130 140 150 160 170 180 

190 200 
orf 18-1 .pep GLTAALMQVSVLVLLLSEIGRX 
1:1 I M I : I : : I I : I I I I I 
orf 18ng VLAAMLMQVAVT AMLLAE I GRX 

190 200 



Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 13 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 103>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

301 GGCGCGGNCG . . . 

This corresponds to the amino acid sequence <SEQ ID 104; ORF19>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 

101 GAX... 

Further work revealed the complete nucleotide sequence <SEQ ID 105>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

4 51 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC' GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GTTACTACTT TGCCGCCCAA G AC AT AC AC G AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

1451 T CAT T AC CAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC AC CAT TAT CG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ID 106; ORF19-l>: 

1 MKTPLLKPLL ITSLPVFASV FT AASIVWQL GEPKLAMPFV LGIIAGGLVD 
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51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

4 01 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

4 51 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

7 01 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results; 

Homology with predicted transmenbrane protein YHFK of K influenzae (accession number P44289) 
ORF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 



or f 19 6 LKPLLIT SLPVFASVFTAAS I WQLGEPKLAMPFVLGI IAGGLVDLDNXXTGRLKN I ITT 6 5 

L +I+++PVF +V AA +W +MP +LGIIAGGLVDLDN TGRLKN+ T 

YHFK 5 LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 



orfl9 66 V AL FT L S S LT AQ STLGTGLPFI L AMT LMTXX FT I LG A 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 LIAFSISSFIVQLHIGKPIQYIVLMTVLTFIFTMIGA 101 



Homology with a predicted OKF from N. meningitidis (strain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF19a) from strain A of AT. 
meningitidis: 



10 20 30 40 50 60 

O r f 1 9 . pep MKTPLLKPLLITSLPVFASVFT AAS I VWQLGE PK LAMP FVLG 1 1 AGGLVDL DNXXTGRLK 
IN! I I 1 I I M II I I I I I I I I II I I I t ! M I I I I I I I I It It I I I I II M ! I Mill 
orf 1 9a MKTPPLKPLLITSLPVFASVFT AASIVWQLGEPK LAMPFVLGIIAGGLVDL DNRLTGRLK 

10 20 30 40 50 60 



70 80 90 100 

orf 1 9 . pep NIITTVALFTLSSLTAQSTLGTGLPF ILAMTLMTXXFTILGAX 

I I I : I I t I I I I i II : I I I I I I I I I I i 1 II I ! I I i I I I : 1 I 
orf 19a Nil AT VAL FT L S S L VAQ S T LGT G L P F I LAMT LMT FG FT I MG AV GLK YRT FAFG ALAVAT Y 

70 80 90 100 110 120 



orf 19a TTLTYTPETYWLTNP FMILCGTVLYSTAI ILF QI ILFHRPVQENVANAYEALGSYLEAKA 

130 140 150 160 170 180 

The complete length ORF 19a nucleotide sequence <SEQ ID 107> is: 



1 ATGAAAACCC CACCCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTG GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCTGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCAC TTGTCGCGCA AAGCACCCTC GGCACAGGTT 

251 TGCCATTCAT CCTCGCCATG ACCCTGATGA CTTTCGGCTT TACCATCATG 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTTAT GATTCTGTGC GGAACCGTAC TGTACAGCAC CGCCATCATC 

451 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTTCAAGAAA ACGTCGCCAA 

501 CGCCTACGAA GCACTCGGCA GCTACCTCGA AGCCAAAGCC GACTTTTTCG 

551 ATCCCGACGA AGCCGAATGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 
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801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAATCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCGGCAGCCT 

1101 C AAAAAC AC C TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTTG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTTACCC CCTCCGTCGA AACCAAACTC TGGATCGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGCTTC TCGACATTTT 

14 51 T CATC AC CAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG GTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CAT CAT CG AC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGCGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

17 51 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT C AG AC AG C AC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGGCAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGACAAAT TCCGCACAGG CAGCCCCAAA ACGCAGCCTG 

2151 A 

This encodes a protein having amino acid sequence <SEQ ID 108>: 

1 MKTPPLKPLL ITSLPVFASV FT AASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLVAQSTL GTGLP FILAM TLMTFGFTIM 

101 GAVGLKYRTF AFGALAVATY TTLT-YTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQENVANAYE ALGSYLEAKA DFFDPDEAEW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDNP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETGSLKNT WQAIRPQLNL ESGVFRHAVR LSLVVAAACT 

4 01 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

4 51 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

7 01 YRAYRQIPHR QPQNAA* 

ORF19a and ORF19-1 show 98.3% identity in 716 aa overlap: 

10 20 30 40 50 60 

MKTPPLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
I I I I I II I I I ( I I ( I I I I II I i I I ( I I I I I I I I I II I I I i I I I i I I I I I ( I I I I I M I I 
MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
10 20 30 40 50 60 

70 80 90 100 110 120 

NIIATVALFTLSSLVAQSTLGTGLPFILAMTLMTFGFTIMGAVGLKYRTFAFGALAVATY 
I I I : I I I I I I II I I : I I I I I II I I I I I II I I I I 1 II M I : ! I I I I I I I I I I I I I I I I I I I 
Nil TTVALFTLS SLTAQSTLGTGLPFI LAMTLMTFGFTI LGAVGLKYRT FAFGALAVATY 
70 80 90 100 110 120 

130 140 150 160 170 180 

TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQENVANAYEALGSYLEAKA 
I I I M I I I I I I I I I I M I I M I I I I I I I I : I I I I : I I I I I I I I : I I I I I : I I I : M I I I I 
TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
130 140 150 160 170 180 

190 200 210 220 230 240 

DFFDPDEAEWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | M | | | | | | ( ( | | | | | | 



orf 19a .pep 
orfl9~l 

orf 19a. pep 
orfl9-l 

orf 19a , pep 
orfl9-l 

orf 19a. pep 
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nrfl 9-1 dffDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

^ orfl9a pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

! 1 I 1 I I I I M I I I I M M I M I II I I I I I I ! I! ! M H M I 1 I I I I I I I I ! I i M M 111 
orf 19-1 DIHERI SSAHVDYQEMSEKFKNTDI I FRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 

10 310 320 330 340 350 360 

orfl9a pep RAIEGCRQSLRLLSDSND^IPDIRHLRRLLD^3LGSVDQQFRQLQHNGLQAENDRMGDTRIA 
M | | | | | | | | | I I M I II : I I I I I I I I I I I I I I I I I M I I I I I ! I I I I I 1 I I I 1 I M ! M 
orfl9-l RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

310 320 330 340 350 360 

15 

370 380 390 400 410 420 

orf 19a pep ALETGSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
M I I : I I M I I I I I I M I I I I I I I I II I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I 
orf 19-1 ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALWL1SILGYWILLTALFV 
20 370 380 390 400 410 420 

430 440 450 460 470 480 

orf 19a pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
I I I I I It I I II I I I I I I M I I I I I I I I I I I I I I I I I I ! I I I M I I I I I II I II I I I I I I I 
25 orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19a. pep STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
30 I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I t I I I I I 

orf 19-1 STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 * 530 540 

550 560 570 580 590 600 

35 orfl9a.pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

I I I I I II I i I M I M II I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I M I I I I I 
orf 19-1 AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

40 610 620 630 640 650 660 

orf 19a. pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

II I I I II I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I II I I I I I I I I I I I I 
orf 19-1 PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

610 620 630 640 650 660 

45 

670 680 690 700 710 

orf 19a. pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I I I I I I I I M I I II I I I I I II II I II I I I II I I I I I I I I I I I II I I I I I I I I I I I I I 
orf 19-1 QTALDTLRGELDTLRT.HSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
50 670 680 690 700 710 

Homology with a predicted ORF from N. gonorrhoeae 

ORF19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from N. 
gonorrhoeae: 

55 orf 19 .pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9ng MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 

orf 19. pep NIITTVALFTLSSLTAQSTLGTGLPFILA^3TLMTXXFTILGAX 103 
60 | | | : | | | | | 1 | | | | ! | || II I I I I II I I I M I I I I I I I I I 

orfl9ng N 1 1 AT VAL FT LS S LTAQ STLGTGL P FI LAMT LMT FG FT I LGAVGLKYRT FAFGALAVAT Y 120 

An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 



acid sequence <SEQ ID 1 10>: 
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1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI IAGGLVD 

51 LDNRLTGRLK NIIATV ALFT LSSLTAQSTL GTGLPFILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVAT Y TTLTYTPETY WLTNPFM ILC GTVLYSTAII 

151 LFQIILPHRP VQESVANA YE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence <SEQ ID 1 1 1>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTTACC CTCTCCTCGC TCACGGCGCA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT TACCATTTTA 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

351 CGCCACCTAC ACCACGCTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCATC 

4 51 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 TGCCTACGAA GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGTTTG CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATCCACG AACGCATCAG CTCCGCCCAC 

7 51 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAAC AC CG ACATCATCTT 

801 CCGCATCCGC CGCCTGCTCG AAATGCAGGG GCAGGCGTGC CGCAACACCG 

851 CCCAAGCCAT CCGGTCGGGC AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

901 CGCGCCATcg aaggctgCCG CCAGTCGCtg cgcctCCTTt cagacggcaA 

951 CGACAGTCCC GACATCCGCC ACCTGAGccg CCTTCTCGAC AACCTCGgca 

1001 GCGTcgacca gcagtTCcgc caactCCGAC ACAgcgactC CCCCGCcgaa 

1051 Aacgaccgca tgggcgacaC CCGCATCGCC GCCCtcgaaa ccggcagctT 

1101 caaaaaCAcc tggcaggCAA TCCGTCCGCa gctgaaCCTC GAATCatgCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCgaag cCCTCAACCT CAACCTCGGC TACTGGATAC TGCTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTGTACC 

1301 AACGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CCTCCGTCGA AACCAAACTC TGGATTGTCA TCGCCGGTAC 

14 01 CACCCTGTTC TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

14 51 T CATC AC CAT TCAGGCACTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTGCG CATCATcgaC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCGGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAGCGGCAC ATACCTCCAA 

1651 AAAATTGCCG AACGCCTCAA AACCGGCGAA ACCGGCGACG ACATAGAATA 

17 01 CCGCATCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

17 51 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTT GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG ACATGGGACC CGACGACTTT CAGACGGCAT TGGATACACT 

2001 GCGCGGCGPJi CTCGGCACCC TCCGCACCCG CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CccgGCAACT CGAACCCTAC 

2101 TACCGCGCCT ACCGACAAAT TCCGCACAGG CAGCCCCAAA ACGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ID 1 12; ORF19ng-l>: 

1 MKTPLLKPLL ITSLPVFASV FT AASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAV GLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQESVANAYE ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT WQAIRPQLNL ESCVFRHAVR LSLVVAAACT 

401 IVEALNL NLG YWILLTALFV CQPNYTATKS RVYQR IAGTV LGVIVGSLVP 

4 51 YFTPSVETKL WIVIAGTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSSGTYLQ 

551 KIAERLKTGE TGDDIEYRIT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
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601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 
651 HLPDMGPDDF QTALDTLRGE LGTLRTRSSG TQSHILLQQL QLIARQLEPY 
7 01 YRAYRQIPHR QPQNAA* 

ORF19ng-l and ORF19-1 show 95.5% identity in 716 aa overlap: 

10 20 30 40 50 60 

orfl9-l pep MKTPLLKPLLITSLPVFASVFTAASIWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
| ] M | | [ M | 1 M I I I I Mi I I i I I I II I I I IN I I I I i i M I I III I III H I M I M t 
orfl9nq-l MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orfl9-l pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 
|| | : | | 1 I I I I I I I II I I M I M II M I II I I I I II I I I M I II II I I I I I I I I I I I I 1 I 
orfl9nq-l NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 
15 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19-1 pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
| | | | | | | M I I I I I I I II I I I I M I I I I I- I M I : I I I I I i M I M I I I : I I I M I II I I 
20 orfl9ng-l TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQESVANAYEALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1 9-1 . pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
25 I I I I I I I I I M I I I I I I M I I I I I I I I I I M I I I I I I I I I M II I I I I I I I II I I M I I I 

orfl9ng-l DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

30 orf 19-1. pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

I I I II I M I I I i I I I I I I I I I I I I I I I I I : I I M I I I II I II I I I I I I I M I I M M 
orfl9ng-l DIHERISSAHVDYQEMSEKFKNTDIIFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 

250 260 270 280 290 300 

35 310 320 330 340 350 360 

orf 1 9-1 . pep RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I M I i I I I M I I I I I : I I I I II I I I I I I I I I I I I I I I M M : I : I I I I I I I I I I I I 
orf 19ng-l RAIEGCRQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIA 

310 320 330 340 350 360 

40 

370 380 390 400 410 420 

orf 19-1 . pep ALE T S S LKN T W Q AI R P QLN LE S G V FRHAVRL S L W AAAC T I VE ALN LN LG YW I L LT AL FV 
I I M : I : I I I I I I I i I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I II I I It I II I 
orf 19ng-l ALE TG S FKN T W Q AI RP QLN LE S C V FRHAVRL S L W AAACT I VE ALN LN LG YW I L LT AL FV 

45 370 380 390 400 410 420 

430 440 450 4 60 470 480 

orf 19-1 . pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

I I II II I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I : I M I M I I I II II I 
50 orf 19ng-l CQPNYTATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSF 

430 440 450 4 60 470 480 

490 500 510 520 530 540 

orf 19-1 -pep STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
55 I I I I I M I I I I I II I I I I I I I I I I I M I I I I I I I I M II I I M II I I I 1 I I M II M I I I 

orfl9ng-l S T FF I T I QALT S L S LAG L D V YAAM PVR 1 1 DT 1 1 GAS LAW AAV S YLW P DWK YLT LE RT AAL 

490 500 510 520 530 540 

550 560 570 580 590 600 

60 orf 19-1. pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

II I I : 1 : I I : II : II I I : I II 1 I I : I I I I I I II I I II II I M I I I I II II I I I II II I I 
orfl9ng-l AVCSSGTYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

65 610 620 630 640 650 660 

orf 19-1 . pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
I I I II I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I 1 I I I II I I I I II : Mil 
orfl9ng-l PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPDMGPDDF 



CHIR-0160 (356.001) 



-127- 



PATENT 



610 620 630 640 650 660 

670 680 690 700 710 

or f 19-1 . pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
5 | | ( | | | | I M I I I I I : II I I I 1 I I I I I I I I I I I I I I I I I I I I M I I 1 I I I I I I I I I 

orfl9ng-l QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

In addition, ORF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

10 sp|033369| YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl | PID i ell54 438 

(AJ002423) hypothetical protein [Neisseria gonorrh] Length = 417 
Score = 1512 (705.6 bits), Expect = 5.3e-203, P = 5.3e-203 
Identities = 301/326 (92%), Positives = 306/326 (93%) 

15 Query: 307 RQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 366 

RQSLRLLSDGNDS DIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 
Sbjct: 1 RQSLRLLSDGNDSXDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 60 

Query: 367 FKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFVCQPNYT 426 
20 FKNTWQAIRPQLNLES V FRH AVR L S LV VAAAC T I VE ALN LN LG YW I LLT LFVCQPNYT 

Sbjct: 61 FKNTWQAIRPQLNLESGVFRHAVRLSLVVAAACTIVEALNLNLGYWILLTRLFVCQPNYT 120 

Query: 427 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 486 
ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 
25 Sbjct: 121 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 180 



30 



Query: 487 I QALT S LS LAG L D V Y AAM PVRI I DT 1 1 GAS LAW AAV S YLW P DWKYLT LERT AAL AV C S S G 546 

I Q ALT S L S LAG L D V Y AAM PVRIIDTIIGAS L AW AAVS YLW P D WK Y LT L E RT AAL AV C S S G 
Sbjct: 181 I QALT S L S LAG L D V Y AAM PVRIIDTIIGAS LAW AAVS YLWPDWKYLTLE RT AAL AVC S S G 240 

Query: 547 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 606 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
Sbjct: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 



35 Query: 607 KTGYALTGYISALGAYRSEMHEECSP 632 

K ALTGYISALG + +P 

Sbjct: 301 KPATALTGY I SALGHTAAKCTKNAAP 326 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
40 with the YHFK protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 14 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
113>: 

45 1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGG.C GAAGCCTTTA 

50 251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCgAGTT 

351 TTGCCCAAGA TGCCGACAAA TTTCAGCTCT CCATCGATTT GCTGCGGATT 

401 ACGTTTCCTT ATATATTATT GATTTCCCTG TCTTCATTTG TCGGCTCGGT 

451 ACTCAATTCT TATCATAAGT TCGGCATTCC GGCGTTTACG CCAC.GTTTC 

55 501 TGAACGTGTC GTTTATCGTA TTCGCGCTGT TTTTCGTGCC GTATTTCGAT 
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551 CCGCCCGTTA CCGCGCyGGC GTGGGCGGTC TTTGTCGGCG GCATTTTGCA 

601 ACTCGrmTTC CAACTGCCCT GGCTGGCGAA ACTGGGCTTT TTGAAACTGC 

651 CCAAACtGAG TTTCAAAGAT GCGGCGGTCA ACCGCGTGAT GAAACAGATG 

701 GCGCCTGCgA TTTTgGGCGT GAgCGTGGCG CAGGTTTCTT TGGTGATCAA 

751 CACGATTTTc GCGTCTTATC TGCAATCGGG CAGCGTTTCA TGGATGTATT 

801 ACGCCGACCG CATGATGGAG CTGCCCAGCG GCGTGCTGGG GGCGGCACTC 

851 GGTACGATTT TGCTGCCGAC TTTGTCCAAA CACTCGGCAA ACCaAGATAC 

901 GGaACAGTTT TCCGCCCTGC TCGACTGGGG TTTGCGCCTG TGCATGCtgc 

951 TGACGCTGCC GGCGgcGGTC GGACTGGCGG TGTTGTCGTT cCCgCtGGTG 

1001 GCGACGCTGT TTATGTACCG CGwATTTACG CTGTTTGACG CGCAGATGAC 

1051 GCAACACGCG CTGATTGCCT ATTCTTTCGG TTTAATCGGC TTAATCATGA 

1101 TTAAAGTGTT GGCACCCGGC TTCTATGCGC GGCAAAACAT CAAwAmGCCC 

1151 GTCAAAATCG CCATCTTCAC GCTCATCTGC mCGCAGTTGA TGAACCTTGs 

12 01 CTTTAyCGGC CCACTrrAAC rCa^TCGGAC TTTCGCTTGC CATCGGTCTG 

1251 GGCGCGTGTA TCAATGCCGG ATTGTTGTTT TACCTGTTGC GCAGACACGG 

1301 TATTTACCAA CCTGG.CAAG GGTTGGGCAG CGTTCTT . AG CAAAAATGCT 

1351 GcTCTCGCTC GCCGTGA 

This corresponds to the amino acid sequence <SEQ ID 1 14; ORF20>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVATLFMYR XFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI XXPVKIAIFT LICXQLMNLX 

4 01 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 

451 SRSP* 

These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGGCG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACT GGT TAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

451 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACGTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

7 01 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGhCCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GACTGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGCT TAATCATGAT 

1101 TAAAGTGTTG GCACCCGGCT TCTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGTGCGG CGGACTGTGG GCAGCGCAGG CTTACCTGCC 

14 01 GTTTGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 1 16; ORF20-1>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAGM LSFVLVIVTA 
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10 



15 



^ 20 



W 25 



30 



ft 35 



40 



101 LGILAAPWVI YVSAPGFAQD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAI LG V SVAQVSLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAIFT LICTQLMNLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

4 51 SLAVMCGGLW AAQAYLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the MviN virulence factor of S. typhimurium (accession number P37169) 

ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 
MN+L +LA V S+TM SRVLGF RD ++AR FG AGMAT DA F FVA FK L PN L L RR + FAE G A F 
MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

AQ AFV P I L AE YKE T R S KE AXE AF I RHV AGML SFVLVIVTALGI L AAP WV I Y VS A P S FAQ D 120 
-f-QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 
S Q AFV P I L AE YKS KQGE E AT R I FVAY V S G LLT LALA VVT VAGMLAAPWV I MVTAPG FADT 133 

ADKFQLS I DLLRITFPYILL I SLSSFVGSVLNSYHKFGIPAFTPXFLNVS FIVFALFFVP 180 
ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 



Orf20. 


1 


MviN 


14 


Orf20 


61 


MviN 


74 


Orf20 


121 


MviN 


134 


Orf20 


181 


MviN 


194 


Orf20 


241 


MviN 


254 


Orf20 


301 


MviN 


314 


Orf20 


361 


MviN 


374 


Orf20 


421 


MviN 


434 



YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D 



RV+KQM PAILGV 



SV+Q+SL+INTIFAS+L SGSVSWMYYADR+ME PSGVLG ALGTILLP+LSK A+ + 



+++ L+DWGLRLC LL LP+AV L +L+ PL +LF Y FT FDA MTQ ALIAYS G 



LIGLI++KVLAPGFY+RQ+I PVKIAI TLI QLMNL F 



NA LL++ LR+ 1+ P 



C+ 



45 



50 



55 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF20 shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 20 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I I I I I I I : II I I I I I I I I II I I I I I I I I I I I I I I It I II I I I II ( I t I I I I I I M I I I i I 
orf 20a MNMLG AL VKVG S LTMV S R V LG FVRDT V I ARAFG AGMAT D AF FVAFKL PN LLRR V FAE G A F 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 20 . pep AQAFVPILAEYKETRSKEAXEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPSFAQD 
I M I II I I I I I M II I I I I : I I I I I II 1 II I I M I M I I I I I I II I I ! II I I I I I : I I : I 
orf 20a AQAFVPILAEYKETRSKEATEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPGFAKD 

70 80 90 100 110 120 
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130 140 150 160 170 180 

ADKFOLSIDLLRIT FPYILLI5LSSFVGSVL NSYHKFGIPAFTPX FLNVSFIVFALFFVP 

| M M | | I ! I I I I M I ! I 11 ! I II M I I I I 1 I 1 I M I : I ! I I I I : i i t ) I ! I i I I 1 i I i I 
ADKFOLSIDLLRIT FPYILLISLSSFVGSVL NSYHKFSIPAFTPT FLNVSFIVFALFFVP 
130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPP VTAXAWAVFVGGILQLX FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQ MAPAILGV 
TTl I till I I I i I It I I I I I I I I I I 1 I I I I I I I I I I I I I H I I I I I I I I I I I I I I I I I 
Y F D P P V T AL AW AV FVGG I LQL G FQL P WL AKLG F LKL PK L S FK D AAVN R VMKQ MA PAI LG V 
190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQVSLVI NT I FAS YLQS GS VS WMYYADRMMELPSGVLGAALGT I LLPTLSKH SANQDT 

M M : | | M I I I II I I I I M I I I I I I I I I I I I I I I : I I I I I I I! I I 1 I I I ! I I I I M 1 I I 
SVAQISLVI NTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 
250 260 270 280 290 300 

310 320 330 340 350 360 

EQFSALLDWGLR LCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQH ALIAYSFG 
I I I I I I I I I I I I II I I i I II I I I : I I I I I I I II I I I M I I I II I I II I I I M I M M I 
EQFSALLDWGLR XCMLLTLPAAVGMAVL5 FPLVATLFMYREFTLFDAQMTQHA LIAYSFG 
310 320 330 340 350 360 

370 380 390 400 410 420 

L I G L IM I KVLA PG FYARQN I XX P VK I AI FT L I CXQLMN LX FX G PLXX I GL S LAI GLGAC I 
I II I I II I I I M I I I 1 I I I I : II I I I I I I I I I : I I I 11 I III : I I I I II I I I I I I 
LIGLIMIKVL APGFYARQHIKTPVK IAIFTLICTQLMNLAFI GPLKHVGLS LAIGLGACI 
370 380 390 400 410 420 

430 440 450 

NAGLLFYL LRRHGIYQPXQGLGSVLXQKCCSRSPX 
I i I I I I t I II I I I I I I I : I : : I ■ 

NAG LLFYL LRRHGIYQ PGKGW A A FL AKML L S LAVMGGGL Y AAQ I W L P FDW AHAGGM QKAA 
430 440 450 460 470 480 

The complete length ORF20a nucleotide sequence <SEQ ID 1 17> is: 

1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 

101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGACG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTCAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCAAAGAT GCCGACAAAT TTCAGCTCTC TATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATCTTATTG ATTTCACTTT CCTCTTTTGT CGGCTCGGTA 

4 51 CTCAATTCCT ATCATAAATT CAGCATTCCT GCGTTTACGC CCACGTTCCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CTCCCGTTAC CGCGCTGGCT TGGGCGGTTT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGTTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGATTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAAC TGCCCGGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCNTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GAATGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CAACCTTGTT TATGTACCGA GAATTCACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATCATGAT 

1101 TAAAGTGTTG GCGCCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATTTGCA CGCAGTTGAT GAACCTTGCC 

12 01 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTGGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGGGAGG CGGCCTGTAT GCCGCCCAAA TCTGGCTGCC 

14 01 GTTCGACTGG GCACACGCCG GCGGAATGCA AAAGGCCGCC CGGCTCTTCA 

14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 



orf20 .pep 
orf20a 

orf 20 . pep 
orf20a 

orf 20 .pep 
orf20a 

orf 20 .pep 
orf20a 

orf20 .pep 
orf20a 

orf 20 .pep 
orf20a 
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This encodes a protein having amino acid sequence <SEQ ID 1 18>: 

1 MNMLGALVKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAKD ADKFQLSIDL LRIT FPYILL I5LSSFVGSV 

5 151 LNSYHKFSIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR XCMLLTLP AAVGMAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAIFT LICTQLMNLA 

\0 4 01 FIGPLKHVGL 5 LAIGLGACI NAGLLFYL LR RHGIYQPGKG W AAFLAKMLL 

451 SLAVMGGGL Y AAQIWLPFDW AHAGGMQKAA R LFILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

ORF20a and ORF20-1 show 96.5% identity in 5 12 aa overlap: 

10 20 30 40 50 60 

15 or f 20a pep MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFBVAFKLPNLLRRVFAEGAF 

I I I I M I *• 1 I I 11 1 I I II I I I U It U M II I M I I I I I M I M I I II M I I I II I I I II 
orf 20-1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 

20 70 80 90 100 110 120 

or f 20a . pep AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAKD 

I M I M I ( I M II M i I I I : I I I II I I I M I I I't I I I I I I I I I I I M I M I M I I II I : I 
orf 20-1 AQAFVPILAE YKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 

70 80 90 100 110 120 

25 

130 140 150 160 170 180 

orf20a.pep ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFS I PAFTPTFLNVS FIVFALFFVP 

II I II I M I I I I II I M I I M I M I ! I I ! ! I II I M I : I I M I M I I II I I I M I ! I 1 I I 
orf20-l ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFG I PAFTPTFLNVS FIVFALFFVP 

30 130 140 150 160 170 180 

190 200 210 220 230 240 

or f 20a . pep YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
I I ( I I II I I I M I I I I I I I M I M I I I I I I I I I I I I I I M I I I II II t II II I I I! M M 
35 orf20-l YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf20a.pep SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 
40 I I II : M I I I I I I II I I I I I I M I I I I I I I I I I I I : I M I I I I I I I I I I I I I I I I I I I I I 

orf 20-1 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

45 orf 20a . pep EQFSALLDWGLRXCMLLTLPAAVGMAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

I I I I I I I I M I 1 I (I I I II I I I I : I I I I II I I II I I II I I I I I I I II I I I M I I I I M I 
orf 20-1 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

50 370 380 390 400 410 420 

orf 20a . pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I I I II I I I I II I II M I II M I I I I I I I I I I I II I I I I I II I I I I I I I I I M I II I I I I I 
orf 20-1 LIGLIMIKVLAPGFYARQNIKT PVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

55 

430 440 450 460 470 480 

orf 20a . pep NAG LLFYLLRRHGIYQPGKG W AA FL AKMLL S L A VMG GG L Y AAQ I W L P F D W AHAG GMQKAA 

I M I II I I I I I I I I I I I M I II I I I I 1 I 1 1 I II M 111:111 : II I : I I I I I I I : I I : 
orf 20-1 NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
60 430 440 450 460 470 480 

490 500 510 

orf 20a, pep RL F I L I AVG GGL Y FAS L AALG FR P RH FKRVE S X 

: I I I M I M I I I I I M II I I I I I I I II I I I : I 
65 orf 20-1 QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

490 500 510 



CHIR-0160 (356.001) 



PATENT 



-132- 



Homology with a predicted ORF from N. gonorrhoeae 

ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from K 
gonorrhoeae: 



10 



15 



20 



25 



30 



35 



orf20.pep 
orf20ng 
orf 20 .pep 
orf20ng 
orf 20 . pep 
orf20ng 
orf20 .pep 
orf20ng 
orf 20 . pep 
orf20ng 
orf 20 . pep 
orf 20ng 
orf 20 .pep 
orf20ng 
orf 20 .pep 
orf20ng 



MNMLGALAKVG S LTMVSRVLG FVRDT V I ARAFGAGMATDAFFVAFKL PNLLRRVFAEGAF 
I 1 I I I I I I I I I I i I I I I I I I I I I ! 1 I I I I I ! I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 
I M I I I I I I I i M II I I 1 I : I II I I 1 I I M I I I I I :: I II I I 1 I M I I I I I I I I 1 : I :: I 
AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 

ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 
I I I I I I I I : II I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I : I ! I I I I I I I I I 
ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLMISFIVFALFFVP 

YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 

SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
I I I I : I I I I I II I I I I I I I I II I II I I I I I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I 
SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 
I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXGPLXXIGLSLAIGLGACI 
I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I I I I I III MINIMUM 
LIGLIMIKVLASGFYARQNIKTPVKIAI FTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

NAGLLFYLLRRHGIYQPXQGLGSVLXQKCCSRSP 454 
! I I I I I : I : I : I I I ! : I III): : I I I I I I I 
NAGLLFFLFRKHG I YRPGQGLGQP SWRKCCSRS P 454 



60 
60 
120 
120 
180 
180 
240 
240 
300 
300 
360 
360 
420 
420 



An ORF20ng nucleotide sequence <SEQ ID 1 19> was predicted to encode a protein having 
acid sequence <SEQ ID 120>: 



ammo 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNMLGALAKV 
LRRVFAEGAF 
LGILAAPWVI 
LNSYHKFGIP 
LGFQLPWLAK 
TIFASYLQSG 
EQFSALLDWG 
QHALIAYSFG 
FIGPLKHAGL 
SRSP* 



GSLTMVSRVL 
AQAFVPILAE 
YVSAPGFTKD 
AFTPTFLNIS 
LGFLKLPKLN 
SVSWMYYADR 
LRLCMLLTLP 
LIGLIMIKVL 
SLAIGLGACI 



GFVRDTVIAR 
YKETRSKEAT 
ADKFQLSISL 
FIVFALFFVP 
FKDAAVNRVM 
MMELPGGVLG 
AAAGLAVLSF 
ASGFYARQNI 
NAGLLFFLFR 



AFGAGMATDA 
EAFIRHVAGM 
LRITFPYILL 
YFDPPVTALA 
KQMAPAILGV 
AALGTILLPT 
PLVATLFMYR 
KTPVKIAIFT 
KHGIYRPGQG 



FFVAFKLPNL 
LSFVLIWTA 
ISLSSFVGSI 
WAVFVGGILQ 
SVAQISLVIN 
LSKHSANQDT 
EFTLFDAQMT 
LICTQLMNLA 
LGQPSWRKCC 



Further DNA sequence analysis revealed the following DNA sequence <SEQ ID 12 1>: 



50 



55 



60 



1 


ATGAATATGC 


51 


GCGCGTTTTG 


101 


CGGGTATGGC 


151 


CTTCGCCGCG 


201 


TTTGGCGGAA 


251 


TCCGCCACGt 


301 


CTGGGCATAC 


351 


TACCAAAGAC 


401 


CGTTTCCTTA 


451 


CTCAATTCCT 


501 


AAACATCTCT 


551 


CGCCCGTTAC 



TTGGAGCTTT 
GGATTTGTGC 
GACGGATGCG 
TGTTTGCGGA 
TATAAGGAAA 
tgcgggAatg 
TTGCCGCgcc 
GCGGACAAGT 
TATATTATTG 
ACCATAAGTT 
TTTATCGTAT 
CGCGCTGGCG 



GGCAAAAGTC 
GCGATACGGT 
TTTTTTGTCG 
GGGGGCGTTT 
CGCGTTCTAA 
CTGTCGTTTG 
tTGGGTGATT 
TCCAACTTTC 
ATTTCTTTGT 
CGGCATTCCC 
TCGCACTGTT 
TGGGCGGTTT 



GGCAGCCTGA 
CATTGCGCGG 
CGTTCAAACT 
GCCCAAGCGT 
AGAGGCGAcg 
TGCTGATcgt 
TATGTTtccg 
CATCAGCCTG 
CTTCTTTTGT 
GCGTTTACGC 
TTTCGTGCCG 
TTGTCGGCGG 



CGATGGTGTC 
GCATTCGGCG 
GCCCAACCTG 
TTGTGCCGAT 
gAGGCTTTTA 
cGttacCGCG 
CgcccGGCTT 
CTGCGGATTA 
CGGCTCGATA 
CCACGTTTTT 
TATTTCGATC 
TATTTTGCAG 
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601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



CTCGGTTTCC 
CAAACTGAAT 
CGCCTGCGAT 
ACGATTTTCG 
cgCCGACCGC 
GTACAATTTT 
GAACAGTTTT 
GACGCTGCCG 
CGACGCTGTT 
CAACACGCGC 
TAAAGTGTTG 
TCAAAATCGC 
TTTATCGGTC 
CGCGTGCATC 
TTTACCGGCC 
GCGCTCGCCG 
GTTCGAATGG 
TCCTGATTGC 
GGCTTCCGTC 



AACTGCCGTG 
TTCAAAGATG 
TTTGGGCGTG 
CGTCTTATCT 
AT GAT G GAG c 
GCTGCCGACT 
CCGCCCTGCT 
GCGGCGGccg 
TATGTACCGA 
TGATTGCCTA 
GCATCCGGCT 
CATCTTCACG 
CGTTGAAACA 
AACGCCGGAT 
cggcaggggt 
TGATGTGCGG 
GCGCACGCCG 
CGTCGGCGGC 
CGCGCCATTT 



GCTGGCGAAA 
CGGCGGTCAA 
agcgTGGCGC 
GCAATCGGGC 
tgcgccGGGG 
TTGTCCAAAC 
CGACTGGGGT 
GACTGGCGGT 
GAATTCACGC 
TTCTTTCGGT 
TTTATGCGCG 
CTCATCTGCA 
CGCCGGGCTT 
TGTTGTTCTT 
tgggcggcgt 
CGGACTGTGG 
GCGGAATGCG 
GGACTGTATT 
CAAACGCGTG 



CTGGGCTTTT 
CCGCGTCATG 
AAATTTCTTT 
AGCGTTTCAT 
CGTGCTGGGG 
ACTCGGCAAA 
TTGCGCCTGT 
ATTGTCGTTC 
TGTTTGACGC 
TTAATCGGTT 
GCAAAACATC 
CGCAGTTGAT 
TCGCTCGCCA 
CCTGTTGCGC 
TCTTGGCGAA 
GCGGCGCAGG 
GAAAGCGGGG 
TCGCATCTCT 
GAAAGCTGA 



TGAAACTGCC 
AAACAGATGG 
GgttATCAAC 
GGATGTatta 
GCTGCACTCG 
CCAAGATACG 
GCATGCTGCT 
CCGCTGGTGG 
ACAAATGACG 
TAATTATGAT 
AAAACGCCCG 
GAACCTCGCC 
TCGGCCTGGG 
AAACACGGTA 
AATGCTGCTC 
CTTGCCTGCC 
CAGCTCTGCA 
GGCGGCTTTG 



This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLIWTA 

101 LGILAA PWVI YVSAPGFTKD ADKFQLSISL LRIT FPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPT FLNIS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

2 01 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QHA LIAYSFG LIGLIMIKVL ASGFYARQNI KTPVK IAIFT LICTQLMNLA 

4 01 FIGPLKHAGL 5 LAIGLGACI NAGLLFFL LR KHGIYRPGRG W AAFLAKMLL 

4 51 ALAVMCGGL W AAQACLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLA AL 

501 GFRPRHFKRV ES* 

ORF20ng-l and ORF20-1 show 95.7% identity in 512 aa overlap: 



10 20 30 40 50 60 

orf 20-1 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I 1 I I M I I I I I I I 1 I I I I I I 1 I I I I I M 11 I I I ! I ! I 1 I i I I I 11 I M I I I I 11 I I I I I I 
orf20ng-l MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGb4ATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 20-1 . pep AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
I I 1 I I I I 1 I I i I I I t I It I : I i M t I M ( I t t I I t : : I I I M I I t I I I t I I I I I I t I : : I 
orf20ng-l AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVL I WTALGILAAPWVI YVSAPGFTKD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 20-1. pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 
I I I I I I I I : I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
orf20ng-l ADKFQLS I SLLRITFPYILLISLSSFVGS I LNSYHKFG I PAFTPTFLNIS FIVFALFFVP 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 20-1 .pep YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
t I i I I I I I I 1 I t i 1 I I I I 1 II I I t ! I I I I I I t I II I M t : I I I II 1 I I I 11 I 1 ! I I 1 I I 1 
orf20ng-l YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 20-1 . pep SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
II II : II I I I I 1 I I I II 1 1 ! I 1 I 1 I I 1 II I I I I I I 1 I I I II I M I I 1 I I I I M I It I I 
orf20ng-l S VAQI S LV INT I FAS YLQSGS VSWMYYADRMMELRRGVLGAALGT I LLPTLSKHSANQDT 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 20-1 .pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
I M I II I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf20ng-l EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
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310 



320 



330 



340 



350 



360 



10 



15 



25 



30 



35 



40 



45 



50 



55 



60 



orf 20-1. pep 
orf20ng-l 

orf20-l .pep 
orf20ng-l 

orf 20-1. pep 
orf20ng-l 



370 380 390 400 410 420 

LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
| | | M M II II I I M I I I I I t M I I M I I I I M I I I I I I I I I I I I 1 H I I I I M I I M 1 
LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

| | | | | | : | | | : | M | : ! ! : II I I II I I I I I : I I I I II I I M ft I I I I I I I I M M M I I 
NAGLLFFLLRKHGIYRPGRGWAAFIAKMLLALAVMCGGLWAAQACLFFEWAHAGGMRKAG 
430 440 450 460 470 480 

490 500 510 

QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 
I I II I I I I I I I I M I I I I I I I M I I I I II I I : I 
QL C I L I AVGG G L Y FAS L AALG FR PRH FKRVE S X 

490 500 510 



20 In addition, ORF20ng-l shows significant homology with a virulence factor of S.typhimurium: 

spl P37169 |MVIN_SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 
typhimurium gi 1438252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl | PID 1 dl005521 (D25292) ORF2 [Salmonella typhimurium] Length = 524 
Score = 1573 {750.1 bits), Expect = l.le-220, Sum P(2) = l.le-220 
Identities = 309/467 (66%), Positives = 368/467 (78%) 



Query: 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMAT DAFFVAFKLPNLLRR+ FAE GAF 
Sbjct: 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

Query: 61 AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 120 

+QAFVPILAEYK + +EAT F+ +V+G+L+ L VVT G+LAAPWVI V+APGF 
Sbjct: 74 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAWTVAGMLAAPWVIMVTAPGFADT 133 

Query: 121 ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

ADKF L+ LLRITFFYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 
Sbjct: 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Query: 181 Y FD PP VT ALAWAV FVGG I LQLG FQL PWLAKLG FLKLPKLN FKDAAVNRVMKQMAPA I LG V 240 

YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D RV+KQM PAILGV 
Sbjct: 194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMG PAILGV 253 

Query: 241 S VAQI S LV INT I FAS YLQSGSVS WMY YADRMMELRRGVLGAALGT I LLPTLSKHS ANQDT 300 

SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
Sbjct: 254 S VS QI S L I INT I FAS FLASGSVS WMY YADRLME FPS GVLGVALGT I LLPSLSKS FASGNH 313 

Query: 301 EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYRE FTLFDAQMTQHALIAYSFG 360 

+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
Sbjct: 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

Query: 361 LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

LIGLI++PCVIJ\ GFY+RQ+IKTPVKIAI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 
Sbjct: 37 4 LIGLIVVKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

Query: 421 NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 4 67 

NA LL++ LRK 1+ P GW VM L+ +P 

Sbjct: 434 NASLLYWQLRKQNIFTPQPGWMWFLMRLIISVLVMAAVLFGVLHIMP 480 

Score = 70 (33.4 bits), Expect « l.le-220, Sum P(2) = l.le-220 
Identities = 14/41 (34%), Positives = 23/41 (56%) 



65 



Query: 4 69 E W AHAG GMRKAG Q L C I L I A VG GG L Y FAS L AAL G FR PRH FKR 509 

EW+ + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 EWSQGSMLWRLLRLMAVVIAGIAAYFAALAVLGFKVKEFVR 521 
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Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 15 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 123>: 

1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

10 201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

4 01 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

15 4 51 GTCAATGCGA tGGACACCAA TCCG . . 

This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGVVFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

20 151 VNAMDTNP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 125>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

25 151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

30 4 01 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

4 51 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

35 651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

40 901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

45 1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 

50 1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGVVFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNP LA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVITIGR 

55 251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 
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301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 
351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 
4 01 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 127>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

201 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG TAAAATCCAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGVVFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPWVIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

4 01 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 

10 20 30 40 50 60 

orf22 .pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

I I I I I I I I I I I I I I I I I I :: I I I I : I I I I j M i i II I i I I I I I I I I I I I I I I I I I I II I 
orf22a MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDA VKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

II I I I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 22a KKXPGVVFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

70 80 90 100 110 120 

130 140 150 

orf 22 . pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNF 
I I I II I I II I I I : I I II II I I I I I I I I I II I ! I I I i I I 
orf 22a NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 
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10 20 30 40 50 60 

0rf22a pep MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
||||||||||||||||||::||||:|IIIMIIIMIMII I I I I I I I 1 I I I I I I I I I I 
orf 22-1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf22a pep KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 
|| | | | | | | | | : | | | I I I I I I I I I I I I I I I I I I I II I I I 1 1 I I I I I I I I I I I 1 I I 1 
10 orf22-l KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf22a.pep NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 
IS I I M I I I I I I I I : I M I I I I M 1 I M I M I > I I I I I I I I I 1 I I : 1 : I I I I I I : I II 

orf 22-1 N L I Q S GLWT ALRT RP F S K I P AV DAE P FA I FVN AMDTN P L AAD PT V 1 1 KE AAE D FKRG L L V 

130 140 150 160 170 180 

190 200 210 220 230 240 

20 orf22a.pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I II II I II I I 
orf22-l LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

25 250 260 270 280 290 300 

orf 22a . pep NYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADNRVI 
I I I I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I : I I I I I 
orf 22-1 NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

250 260 270 280 290 300 

30 

310 320 330 340 350 360 

orf 22a. pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
I I M I I M I I I I I II I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I M I I I M I I I I I I 
orf 22-1 SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
35 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22a . pep LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I I : I I II I I I I I I I I I I II M I I I II I I I I I I I I I I ! I 1 I I I I I I I I I I II I I I I I II 
40 orf 22-1 LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
orf 22a. pep LCSFVCPGKYEXGPLLRKVLETXEKEGX 
45 I I I I I I I I I I ! I I I I I I II I I I I I I I 

orf 22-1 LCSFVCPGKYEYGPLLRKVLETIEKEGX 

430 440 

Further work identified a partial gene sequence <SEQ ID 129> from N. gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ID 130; ORF22ng>: 

50 1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

55 251 LFVTGRLNTE RVVALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

Further work identified complete gonococcal gene <SEQ ID 13 1>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAATCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

60 101 AAGAATATGT CGGCATGCGC CCCTCGATGA AAATCAAGGA AGGTGAAGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTAGT 

201 ATTTACTGCG CCGGCTTCAG GCAAAATCGC CGCTATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGTACC TGAAGCGCTG GCAAAATTGA GCAGCGAAAA 
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351 AGTGCGCCGC AACCTGATTC AATCAGGCTT ATGGACTGCG CTTCGCACCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTAGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATCAT 

501 CAAAGAAGCC GCCGAAGACT TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 

551 TGACCGAACG TAAAATCCAT GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAATAT CGAAACACAT GAATTTGGCG GCCCGCATCC 

651 TGCCGGCTTG AGTGGCACGC ACATTCATTT CATCGAGCCA GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ACGTGATTGC TATCGGACGT 

7 51 TTGTTCGTAA CAGGCCGTCT GAATACCGAG CGCGTGGTTG CCTTGGGCGG 

801 CCTGCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 

851 TGTCTCAACT TACCGCCGGC GAATTGGTTG ACGCGGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG TGCGATTGCA CAAGGCGCGC ATGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGC 

1051 ACCACTCTCG GCCATTTCCT AAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTACC GATCGGCACT TATGAGCGCG 

1151 TAATGCCGTT GGACATCCTG CCTACCTTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 132; ORF22ng-l>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGVVFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 



The originally-identified partial strain B sequence (ORP22) shows 93.7% identity over a 158aa 
overlap with ORF22ng: 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

I I I M ! I I i I M I II I I I : : I I i t I t I I t I I It I 1 t : I t 1 M I t : I II : M I I I I t I I I I 
orf22ng MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 60 

orf 22 . pep KKNPGVVFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

I I I II I I I I II I I I M I I I M I I t I I I I I II I I I I I I M I I I II : I I I I I : I I : I : I I I 
orf22ng KKNPGVVFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 120 

orf 22 .pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

I II i { I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I 
orf22ng N L I Q S GLWT ALRTRP F SKI P AV DAE P FAI FVN AMDTN PLAADPT V I 1 KE AAE D FKRGLLV 180 



The complete sequences from strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 

10 20 30 40 50 60 

MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
II I I I I I I I I I I 1 I II II : : I It M I I I I II i I 1 I I : t II I I I I : I I I : I I I I I I I I I I I 
MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 
10 20 30 40 50 60 

70 80 90 100 110 120 

KKNPGVVFTAPASGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYAPEALANLSGEEVRR 
I I II II M I II I II I I I I 1 I I I I I I I I II I I I I M I I I I I II I I 1 : i I I I I : I I : I : I M 
KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 
70 80 90 100 110 120 



orf 22-1 .pep 
orf22ng-l 

orf 22-1. pep 
orf22ng-l 



orf 22-1 .pep 



130 140 150 160 170 180 

NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPIAADPTVIIKEAAEDFKRGLLV 
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10 



I | I I I I I I I I I I 1 I II ! t I I I M I I I ! i I t 1 t f I I I I I I I I I t I I I t I I t I I I t I I I t I I 
orf22ng-l NLIQSGLWTALRTRPFSKIPAVDAEPFAIFWAMDTNPLAADPTVIIKEAAEDFKRGLLV 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22-1 . pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
I | I I I I I I II I i I I I I I I I I I I I I I I I I t I I I I I I I 1 I I I M I I I I M I I I I \ 1 I I I I I f 
0rf22ng-l LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 22-1. pep NYQDVITIGRLFATGRLNTERVIALGGSQWKPRLLRTVLGAKVSQITAGELVDTDNRVI 
I | | | | I : | I | I I : M I I I I I I I : i I I I t I I I II I I I I I I II I I I I : i I I I I I I : I I I I I 
orf22ng-l NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 
15 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22-1. pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAFQPDKYSITRTTLGHFLKNK 
1 I I I I I I I I : I i I I I II M I I I I I I I I I I I I I I I I II II I I I I I I I I I I 1 I I II I I I I 1 I 
20 orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 



370 380 390 400 410 420 

LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I I : I I I I I I I I I I I I I I I I I I I I M II II I I I I I I I I I II I I I I I I I I I I I II I I I I I 
LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
370 380 390 400 410 420 

430 440 
LCSFVCPGKYEYGPLLRKVLETIEKEGX 
! 1 M I I I II I I I I II M I I I I ! I I I I II 
LCSFVCPGKYEYGPLLRKVLETIEKEGX 
430 440 

Computer analysis of these sequences gave the following results: 

35 Homology with 48kDa o uter membrane protein of Actinobacillus pleuropneumoniae (accession number U24492). 
ORF22 and this 48kDa protein show 72% aa identity in 158aa overlap: 



orf22-l.pep 

25 

orf22ng-l 



30 orf22-l.pep 
orf22ng-l 



40 



45 



55 



Orf22 


1 


48kDa 


1 


orf22 


61 


48kDa 


61 


orf22 


121 


48kDa 


121 



MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 
MI IKKGL+LPIAG P Q H+G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 
MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 



KKNPGVVFTAPASG + I+RGEKRVLQSWI VE +++I F RY LA+LS E+V++ 



NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNP 



ORF22a also shows homology to the 48kDa Actinobacillus pleuropneumoniae protein: 

50 gi 1 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus pleuropneumoniae] 

Length =44 9 



Score = 530 bits (1351), Expect = e-150 

Identities = 274/450 (60%), Positives = 323/450 (70%), Gaps = 4/450 (0%) 

Query: 1 MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 60 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
Sb j ct : 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

60 Query: 61 KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 120 

KK PGWFTAP SG + I+RGEKRVLQSWI VEG+++I F RY LA+LS + 
Sbjct : 61 KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 
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Query: 121 NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPIAADPVWIKEAXXDFRRXXLV 180 

NLI+SGLWTA R RPFSK+PA+DA P + 1 FVNAMDTN PLAADP W+KE DF+ V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 

5 Query 181 LSRL--TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 237 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 240 

Query: 238 WTINYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADN 2 97 
10 W +NYQDVIAIG+LF TG L T+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 

Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 2 98 RVISGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 357 
RVISGSVL+GA G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
15 Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 



20 



Query: 358 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 417 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 419 

Query: 418 XXXXXS FVCPGKYEXGPLLRKVLETXEKEG 447 
++VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A.pleuropneumoniae: 

gi | 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus 
25 pleuropneumoniae] Length = 4 49 

Score = 555 bits (1414), Expect = e-157 

Identities « 284/450 (63%), Positives = 337/450 (74%), Gaps - 4/450 (0%) 

Juery: 2 7 MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 86 
30 MI IKKGL+LPIAG P QVI++G + EVA+LGEE YVGMRPSMK++E G+ VKKGQVLFED 

MITIKKGLDLPIAGT PAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDVVKKGQVLFED 60 

KKNPGVVFTAPASGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYVPEALAKLSSEKVRR 14 6 
KKNPGVVFTAPASG + I+RGEKRVLQSWI VEG+++I F RY LA LS+E+V++ 
35 Sbjct: 61 KKN PG VV FT APASGT VVT INRGEKRVLQS W I KVEG DEQ IT FTR YE AAQLAS L S AEQ VKQ 120 



40 



45 



55 



60 



Query: 


27 


Sbjct: 


1 


Query: 


87 


Sbjct: 


61 


Query: 


147 


Sbjct : 


121 


Query : 


207 


Sbjct : 


181 


Query: 


264 


Sbjct : 


241 


Query: 


324 


Sbjct: 


301 


Query: 


384 


Sbjct: 


361 


Query: 


444 


Sbjct: 


420 



NLI+SGLWTA RTRPFSK+PA+DA P + I FVNAMDTN PLAADP V++KE DFK GL V 



L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 



W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 



50 RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 



K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 



++VCPGK YGP+LR LE I EKEG 



Based on this analysis, including the homology with the outer membrane protein of Actinobacillus 
pleuropneumoniae, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in E.colU as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5 A shows the results of affinity purification of the GST-fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in E.colL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 16 

The following partial DNA sequence was identified in K meningitidis <SEQ ID 133>: 

1 . . GCGnCGnAAA TCATCCATCC CC. .nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

401 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 

451 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

701 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

751 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 

801 GrkCmmmTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 

851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 

901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; ORF12>: 

1 ..AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGWFVA LSALLAWSIV PADGILRHPE 

101 TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPIFVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

301 WVFVLGLPVG PGAPTFYPAP * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 13 5> to be: 



1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GAT TT AC ATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

4 51 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 
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851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTC TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

1401 GACGGTGATC AAATACAAAA AAGATGCGGG CGTGGGTACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 

1551 ATTCTATCCC GCACCTTAA 

This corresponds to the amino acid sequence <SEQ ID 136; ORF12-l>: 



1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF I IFIVLLLIA SAV GAYFGLS 

51 VPDPRPVGAK GRADDG LIYI VSLLNADGFI KIL THTVKNF TG FAPLGTVL 

101 VSLLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 WLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VAL SAL LAWS IV PADGILRH 

301 PETGLVSGSP FLKS IWFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGSVLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

4 51 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVLGLP VGPGAPTFYP AP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF12 shows 96.3% identity over a 320aa overlap with an ORF (ORF12a) from strain A of N. 
meningitidis: 

10 20 30 

orf 12 .pep AXXIIHPXXVVGPEANWFFMVASTFVIALI 

1 I I I I I I I I I I I I M I I I I I II II t I 
orf 12a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALI 
180 190 200 210 220 230 



40 50 60 70 80 90 

orf 12 . pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
II I I I I I I I I I I I I I I I I I I 1 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II 
orf 12a GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
240 250 260 270 280 290 



100 110 120 130 140 150 

orf 12. pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEVVNAXAESMS 
I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
orf 12a PADGILRHPETGLVSGSPFLKSIVVFIFLLFALPGIVYGRVTRSLRGEQEVVNAMAESMS 
300 310 320 330 340 350 



160 170 180 190 200 210 

orf 12 . pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 
II I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 12a TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 
360 370 380 390 400 410 



220 230 240 250 260 270 

orf 12 , pep IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 
I I I I M I M M I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I 
orf 12a IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 
420 430 440 450' 460 470 



280 290 300 310 320 



CHIR-0160 (356.001) 



-143- 



PATENT 



orfl2 pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
t | M | | | M I I I I II I I I I I i I I M I i 11 ! I I I I M I M 1 I 1 I M M I ! I 
orfl2a KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
480 490 500 510 520 

The complete length ORF12a nucleotide sequence <SEQ ID 13 7> is: 



1 


ATGAGTCAAA 


51 


ATGGCTGGGC 


101 


TTGTGTTATT 


151 


GTCCCCGATC 


201 


GATTCACGTT 


251 


CGCATACCGT 


301 


GTTTCTTTAT 


351 


ATTAATGCGC 


401 


TGGTTGTTTT 


451 


GTCGTCCTAA 


501 


TCCGCTTGCC 


551 


CGGCCAATCT 


601 


CAACAGGCGG 


651 


CAACTGGTTT 


701 


ATTTTGTTAC 


751 


GATTTGTCAC 


801 


TTTGGAATAT 


851 


CCGCCCTATT 


901 


CCTGAAACAG 


951 


TTTTATTTTC 


1001 


CC CGAAGTTT 


1051 


AT GAG TACT C 


1101 


TGTCGCATTT 


1151 


GGGCGACGTT 


1201 


GGTTTTATTT 


1251 


CGCGCAATGG 


1301 


CCGGCTACGC 


1351 


GTTACCAATA 


1401 


GACGGTGATC 


1451 


TGATGTTGCC 


1501 


TGCATTTGGG 


1551 


ATTCTATCCC 



CCGATACGCA 
AATATGTTGC 
GCTGATTGCC 
CGCGCCCTGT 
GTCAGCCTGC 
TAAAAATTTC 
TGGGCGTGGG 
TTATTGCTCA 
TACAGGGATT 
TCCCTTTGTC 
GGTCTGGCTG 
GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACGGGACGGA 
CGChCCCGGT 
TCTGCCGCCG 
TGGTGCGAAA 
TCGATGCTGA 
ACCGGTTTCG 
GATTGCGGAA 
CAAAATCTCC 
TTATCTAATA 
CGCCATCATC 
CGGCTTTCGC 
ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGA 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGATTTTTAC 
TACGCTTTTT 
GTGCGTATTT 
GGACGTGCCG 
CGGTTTGATC 
CGCCGTTGGG 
AAATCGGGCT 
ACGCAAACTC 
CCGCTTCTGA 
TTTCATTCCC 
CGGCGTTTCG 
CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



GCACAGTCGA 
ATTATTTTCA 
CGGACTATCC 
ATGACGGTTT 
AAAATCCTGA 
AACGGTGTTG 
TGATTTCCGC 
ACTACTTTTA 
ATTGGGCTAT 
TCGGCCGCCA 
GGCGGTTATT 
AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CAATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This encodes a protein having amino acid sequence <SEQ ID 138>: 



1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLLIA SAA GAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TGFAPLGTVL 

101 VSLLGVGIAE KSGLISALMR LLLTKSPRKL TTFMVVFTGI LSNTASELGY 

151 VVLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGVVF VAL SAL LAWS IV PADGILRH 

301 PETGLVSGSP FLKS IVVFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGSVLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

4 51 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVLGLP VGPGAPTFYP AP* 



ORF12a and ORF12-1 show 99,0% identity in 522 aa overlap: 

10 20 30 40 50 60 

orfl2a.pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAAGAYFGLSVPDPRPVGAK 
II I I I I I II I I I I I I I M 1 I I I I I I I I I M I II I I I I I M I I : I I I I I I I I I I I I I I I I I 
orfl2-l MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 12a . pep GRADDGLIHVVSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I I I II I I : : I I I I : I I I : I I I M I I I I I I I I I I I I II I I ! M I I I I I I I I I I I I I I I I I 
orf 12-1 GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 12a pep LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPIAGLAAAFAGVS 
' P P n 1 M M U I ( f 1 i i t f ( i t I M M I 1 ft 1 I M i M ! ft M I M M I 1 I! I M 1 M I 1 II 
orf 12-1 LLLTKSPRKLTTFMVVFTGILSNTASELGYWLIPLSAIIFHSLGRHPIAGLAAAFAGVS 
5 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12a pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 
(milMIUMMillMIIHMimniMMHIIIMUMIIIMIIIIII! 
1 0 orf 12-1 GGYSANLFLGT I DPLLAGITQQAAQI IHPDYWGPEANWFFMVAST FV I ALI GYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfl2a pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIVPADGILRH 
15 | I M I I I I I i I I M I I I I I I I I II M I I I t t I I I I I I I M I I I I I I I I I ! I M I I II I I I 

orf 12-1 VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

20 orfl2a pep PETGLVSGSPFLKSIVVFIFLLFALPGIVYGRVTRSLRGEQEVVNAMAESMSTLGLYLVI 

I I I I I I ( I M I t I I I I I I I I M I I I I ( I I I I i I I I M I I I I I I II I I II I M I M I I M I 
orfl2-l PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
310 320 330 340 350 360 

25 370 380 390 400 410 420 

orfl2a pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
I I I I I I || II I M I I I M I I I I I I I I I M I I I I I I I I II I I I I I I I I II I I I I I I II II I 
orfl2-l IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
370 380 390 400 410 420 

30 

430 440 450 460 470 480 

orf 12a . pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

I | | | I I I M I I I I II I I II I I II I I I I I I II I II I I I II I M I I I I M I I I I I I II I I I I 
orf 12-1 AVT AP I FV PMLMLAG Y APE V I Q AAYRI GD S VTN 1 1 T PMM S YFGL I MAT V I KYKKD AG VGT 

35 430 440 450 460 470 480 

490 500 510 520 

orf 12a. pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

II I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I 
40 orf 12-1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYFAPX 

490 500 510 520 

Homology with a predicted ORF from N [gonorrhoeae 

ORF12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 
45 gonorrhoeae: 

orf 12. pep AXXIIHPXXWGPEANWFFMVASTFVIALI 30 

I Mil I I I I I I II I 11 : M I M II I 1 
orf I2ng AAAFAGVSGGYSANLFLGTI DPLLAGITQQAAQI IHPDYVVGPEANWFFMAASTFVIALI 232 

50 orf 12 .pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIV 90 

I I I I I [ 11 I I I 1 I I I I I I M I I 1 I I II I I 1 I M I I I I I I I I I I I I I II I II I I I I I I I I I 
orfl2ng GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 2 92 

orf 12. pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 150 
55 I II I I I I I II I I I I : I I I I I II I I i I It I I 1 ( I I I M I I I : I I I I I I I : I I I I I (fill 

orfl2ng PADGILRHPETGLVAGSPFLKSIVVFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 352 

orf 12 . pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 210 
I I I I I I I I I I I I I II II II II I I I I I I It : I I I : I I I t t I I I I II t t I I 1 M II 
60 orfl2ng TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILICAFINLM 412 

orf 12 . pep IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 270 

I I I II t t I I t I II I I M I I I I I I I I : It I M I I I I It f I I I t t M I I I I I It I t II t 
orfl2ng IGSASAQWAVTAPIFVPMLMLAGNAPQVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 472 

65 
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orfl2 pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAP 320 

I I I I I I I I I I I M t i 1 I 1 I M I I I I I i I I II I I I I I i I I I I : I i i t I • I 
orfl2ng KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 

The complete length ORF12ng nucleotide sequence <SEQ ID 13 9> is: 

1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGcc tctgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

4 51 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

7 01 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

7 51 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

1001 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGTGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CT GAT GAT AG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GAT GAT GAG T TATTTCGGGC TGATTATGGC 

14 01 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 

14 51 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG GCACACCCAC 

1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 140>: 

1 MSQTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TG FAPLGTVL 

101 VSLLGVGIA E KSGLISALMR LLLTKS PRKL TTFMVVFTGI LSNTASELGY 

151 VVLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YVVGPEANWF FMAASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVAGSP FLKS IVVFIF LLFALPGIVY G RITRSLRGE REWNAMAES 

351 « MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGAVFLKK FRLGGSVLFI 

4 01 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGNAPQV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGTPTFYP VP* 

ORF12ng shows 97.1% identity in 522 aa overlap with ORF12-1 : 

10 20 30 40 50 60 

MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 
I t I I I :: I : I I I I i I I II I I i II I I I ( I.I I II I I I I I I I ( I I I I It I I I ( I I I I t I I I I i 
MSQTDARRSGRFLRTVEWLGNMLPHPVTLFI I FIVLLLIASAVGAYFGLS VPDPRPVGAK 
10 20 30 40 50 60 

70 80 90 100 110 120 

GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I I It I I I : : I I II : I I I : II I I I M I I I I I I I I I I I I I II I I I I I I I I I i I I I I I I j II 
GRADDGLIHVVSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
70 80 90 100 110 120 

130 140 150 160 170 180 

LLLTKS PRKLTTFMVVFTGILSNTASELGYVVLIPLSAI I FHSLGRHPLAGLAAAFAGVS 



orf 12-1 -pep 
orf 12ng 

orf 12-1 . pep 
orfl2ng 

orf 12-1 . pep 
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| | | | | | t I t I I I I I I i I t t i I I t I I I I I I I I I II I I i I : I I I I M I I M I I t I I I I M 1 I 
orfl2nq LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVS 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12-1 pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 
| I | | I I I ! I I M I M 1 1 I I I M I I I I 1 I I 1 II I ! I 1 I i I II 1 : I II I I I I I I I M I i I I i 
orfl2ng GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALIGYFVTEKI 

190 200 210 220 230 240 



10 



250 260 270 280 290 300 

or f 12-1. pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
I | | M II I I II I I I I I I M i I I I t I I I I I I I t I I I I I M I II I I II I I I I 11 I II II I I I 
orfl2ng VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIVPADGILRH 
15 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12-1 . pep PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
I I I I I !: I I I I I 1 I I I 1 I I I I I 1 I I I I I I I 1 I : 1 I I I I I I : I I M I 1 I I I I I I I I I I I I I 
20 orf 12ng PETGLVAGSPFLKSIVVFIFLLFALPGIVYGRITRSLRGEREVVNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 12-1. pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
25 I I I 1 I I t I i t I I I i I I M II I I I I I : I II I I I I M I M I I I I M i I I II I I I I I I I I I I I 

orfl2ng IFFAAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

30 orf 12-1 . pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

I M I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I II I I 
orfl2ng AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 

35 490 500 510 520 

orf 12-1. pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I I I I I I i II I I M I M I I I I I I I I I I I I I I I : I I I I I : I I 
orfl2ng LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVPX 

490 500 510 520 

40 In addition, ORF12ng shows significant homology with a hypotehtical protein from E.coli: 

sp|P4 6133|YDAH_ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gi (1787597 (AE000231) hypothetical protein in ogt 5' region [Escherichia coli] 
Length = 510 
Score = 329 bits (835), Expect = 2e-89 
45 Identities = 178/507 (35%), Positives = 281/507 (55%), Gaps = 15/507 (2%) 

RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 
+SG+ VE +GN +PHP +A+ + FG+S +P D 
QSGKLYGWVERIGNKVPHPFLLFIYLIIVLMVTTAILSAFGVSAKNP TDGTP 64 

I H VV S L L DADG L I K I LTHT VKN FTG FAPXXXXXXXXXXXX I AEKS GL I S ALMRLLLTKS P 127 
+ V +LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 

VWKNL L S VEGLHWFL PN V I KN FS GFAPLGAI LALVLGAGLAERVGLL PALMVKMAS HVN 124 

RKLTT FMVV FTG I LSN T ASE LGY WL I PLSAVI FH S LGRH PLAGLAAAFAGVS GGY S ANL 187 
+ ++MV+F S+ +S+ V++ P+ A+IF ++GRHP+AGL AA AGV G++ANL 



+ T D LL+GI+ +AA +P V NW+FMA+S V+ ++G +T+KI+EP+LG 



+Q + ++ + + S GL AGW + A +A ++P +GILR P V 

WQGNSDEKLQTLTESQRF GLRIAGWSLLFIAAIALMVIPQNGILRDPINHTVM 2 98 



SPF+K IV I L F + + YG TR++R + ++ + M E M + ++ 





Query: 


8 


50 


Sbjct : 


13 




Query: 


68 




Sbjct: 


65 


55 


Query: 


128 




Sbjct: 


125 


60 


Query: 


188 




Sbjct: 


185 




Query: 


248 


65 


Sbjct: 


245 




Query: 


308 
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Sbjct: 299 PSPFIKGIVPLIILFFFWSLAYGIATRTIRRQADLPHLMIEPMKEMAGFIVMVFPLAQF 358 

Query: 368 XXXXN WTN I GQ Y I AVKGAVFLKEVGLGG S VLFI G FI L I CAFI N LM I G S AS AQWAVT AP I F 427 

NW+N+G++IAV L+ GL G F+G L+ +F+ + I S SA W++ APIF 

Sbjct: 359 VAMFNWSNMGKFIAVGLTDILESSGLSGIPAFVGLALLSSFLCMFIASGSAIWSILAPIF 418 

Query: 428 VPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 487 

VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 
Sbjct: 419 VPMFMLLGFHPAFAQILFRIADSSVLPLAPVSPFVPLFLGFLQRYKPDAKLGTYYSLVLP 478 

Query: 488 YSAFFLIAWIALFCIWVFVLGLPVGPG 514 

Y FL+ W+ + W +++GLP+GPG 
Sbjct: 47 9 YPLIFLWWLLMLLAW-YLVGLPIGPG 504 



1 5 Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 17 

20 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 141>: 

1 . . ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGTGGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

25 201 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 

251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

4 01 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAATCAG TTTgTGCGGC 

30 4 51 agATyGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT. . 

This corresponds to the amino acid sequence <SEQ ID 142; ORF14>: 

1 . . TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

35 101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG. . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from K meningitidis (strain A) 

ORF14 shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) from strain A of N. 
40 meningitidis: 

10 20 30 

orf 14 .pep T AG AAGXXV FV FVT D S QVE V FGN I QT AVET 

1:1111 I I I I M I : 1 : : I I M : I I II I 
orf 14a GRQLGFLRVGGALFVITAQARVNNALCDCLTTGAAGFAVFVFVTDGQMQVFGNVQPAVET 
45 150 160 170 180 190 200 

40 50 60 70 80 90 

orf 14 . pep GFFHGI S VS SVFGAAAQDSAMASRSAS I PVFS ATEMRTAAI FPAASRHMPVFCS S DGSRS 
I I I i I I I I I I I I I I I I f i i I II I I I I f I I I I i I I I I I I I I I I I I I I I II I I I I I I | ) | 1 
50 orf 14a GFFHGI SVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSS DGSRS 

210 220 230 240 250 260 
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100 110 120 130 140 150 

orfl4 oep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

p p iiiiiiiimiiimiitiiMMiMmiitntiimmmiiiitiiiiii 

orf 14a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
5 270 280 290 300 310 320 

160 

orf 14 . pep RXLTNPTVSVRIMLHSG 
t 111111111111111 

10 orf 14a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 

330 340 350 360 370 380 

The complete length ORF14a nucleotide sequence <SEQ ID 143> is: 

1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

15 101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

20 351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

401 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

451 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

25 601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 

751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 

801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

30 851 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 

901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

35 HOI CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

This encodes a protein having amino acid sequence <SEQ ID 144>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAVIEV DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

40 151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAVVSSVAKS WSFAYMPDLV SRLNRLDLPT LV* 

45 It should be noted that this sequence includes a stop codon at position 118. 
Homology with a predicted ORF from N.gonorrhoeae 

ORF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from N. 
gonorrhoeae: 

orf 14 .pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 30 

50 I I I I I I I : I I : I : I :: I I I I : I I I I I 

orfl4ng GRQFGFFRVGGASFVITAQAGIDDALCDCLTADAAGFAVFAFVADGQMQVFGNVQPAVET 208 

orf 14. pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 90 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II II I I I I I I I I I I I I II I 
55 orf!4ng GFFHGI SVSS VFGAAAQYSAMASRSAS I PVFSATEMRTAAI FPAASRHMPVFCS SDGSRS 268 



60 



orf 14 . pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

I I M I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I : I I I : I I I I I I I I I I I 
orfl4ng VLLYTLMHGISWAWISCSTFSTSSICCPLFRAAASTTCSSTSACTVSSKVAEKAEISLCG 328 
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orfl4.pep RXLTNPTVSVRIMLHSG 167 
I I I H ! i t I It t I I : I 

orfl4ng rslTNPTVSVRIMLHAGLMYSRRAWSRVAKSWSFAYMPDLVSRLNRLDLPTLV 382 

The complete length ORF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 
having amino acid sequence <SEQ ID 146>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKADD VLF AFFLVGGFDF 

51 LRVI GCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 

301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

351 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and K gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 18 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 147>: 

1 . . GGCCATTACT CCGACCGCAC TTGGAAGCCG CGTTTGGNCG GCCGCCGTCT 
51 GCCGTATCTG CTTTATGGCA CGCTGATTGC GGTTATTGTG ATGATTTTGA 
101 TGCCGAACTC GGGCAGCTTC GGTTTCGGCT ATGCGTCGCT GGCGGCTTTG 
151 TCGTTCGGCG CGCTGATGAT TGCGCTGTTA GACGTGTCGT CAAATATGGC 
201 GATGCAGCCG TTTAAGATGA TGGTCGGCGA CATGGTCAAC GAGGAGCAGA 
251 AAA . NTACGC CTACGGGATT CAAAGTTTCT TAGCAAATAC GGGCGCGGTC 
301 GTGGCGGCGA TTCTGCCGTT TGTGTTTGCG TATATCGGTT TGGCGAACAC 
351 CGCCGANAAA GGCGTTGTGC CGCAGACCGT GGTCGTGGCG TTTTATGTGG 
401 GTGCGGCGTT GCTGGTGATT ACCAGCGCGT TCACGATTTT CAAAGTGAAG 
451 GAATACGANC CGGAAACCTA CGCCCGTTAC CACGGCATCG ATGTCGCCGC 
501 GAATCAGGAA AAAGCCAACT GGATCGCACT CTTAAAA.CC GCGC . . 

This corresponds to the amino acid sequence <SEQ ID 148; ORF16>: 

1 . . GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 
51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 
101 VAAILPFVFA YIGLANTAXK GWPQTVWA FYVGAALLVI TSAFTIFKVK 
151 EYXPETYARY HGIDVAANQE KANWIALLKX A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 149>: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAAATGAGCC GCATTTTTCA AACGCTAGGC 

151 GCAGACCCGC ACAATTTGGG CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

451 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCAAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACGA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 
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951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCTTGTT TAACGGCTCT 

1201 ATCTGTATGC CTCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 

This corresponds to the amino acid sequence <SEQ ID 150; ORF16-l>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHNLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMIL MPN SGSFGFGYA S LAALSFGALM IALLDV SSNM AMQPFKMMVG 

151 DMVNEEQKGY AYGIQSFLAN TG AVVAAILP FVFAYIGLA N TAEKGWPQT 

201 VWAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 EAGNWYG VLA AVQSVAAVIC SFVL AKVPNK YHKAGY FGCL ALGALGFFSV 

351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

4 01 I CMPQ IVASL LSFVLFPMLG GL QATMF LVG GVVLLLGAFS VFLI KETHGG 

451 V* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF16 shows 96.7% identity over a 181aa overlap with an ORF (ORF16a) from strain A ofN. 
meningitidis: 

10 20 30 

orfl6 pep GHYSDRTWKPRLXGRR LPYLLYGTLIAVIV 

I I I I I I I I I I I I I I I I II I I I ! I M I I I I 
orfl6a 1FQTLGADPHSLGW FFILPPLAGMLVQPIVG HYSDRTWKPRLGGRR LPYLLYGTLIAVIV 
50 60 70 80 90 100 



40 50 60 70 80 90 

orf 16 . pep MILMPNSGSFGFGYA S L AAL S FG ALM I AL L D V S SNMAMQ P FKMM V G DMVNEE QKX YA Y G I 
I I I I I II M I I I I I M I I I ! I I I I I I I I I I 1 I I ! I 1 I I I I 1 I 1 1 i I I II I I I I I I I I I I 
orf 16a MILMPNSGSFGFGY ASLAALSFGALMIALLDV SSNMAMQFFKMMVGDMVNEEQKGYAYGI 
110 120 130 140 150 160 



100 110 120 130 140 150 

orf 16 .pep QSFLANTG AVVAAILPFVFAYIGLA NTAXKGVVPQT VVVAFYVGAALLVITSA FTIFKVK 
I I I I I I I I I I I I I I I I t I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
orf 16a QSFLANTG AWAAILPFVFAY1GLA KTAEKGWPQT VWAFYVGAALLVITSA FTIFKVK 
170 180 190 200 210 220 



160 170 180 

orf 1 6 . pep EYXPETYARYHGI DVAANQEKANWIALLKXA 
II I I I I I I t I I I II I II M I I I I I 111:1 
orf 16a EYNPETYARYHGIDVAANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAI 
230 240 250 260 270 280 



orf 1 6a AEN VWHTT DAS S VG YQE AGNW YG VLAAVQS VAAV ICS FVL AKV PNKYHKAG Y FGC LALGA 

290 300 310 320 330 340 . 

The complete length ORF 16a nucleotide sequence <SEQ ID 151> is: 



1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG C AG AT GAG CC GCATCTTCCA GACGCTCGGT 

151 GCCGATCCGC ACAGCCTCGG CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 
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4 51 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCGAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACAA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCCTGTT TAACGGCTCT 

1201 ATCTGTATGC CGCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 152>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMILMPN SGSFGFGY AS LAALSFGALM IALLDV SSNM AMQPFKMMVG 

151 DMVNEEQKGY AYGIQSFLAN TG AVVAAILP FVFAYIGLA N TAEKGWPQT 

201 VWAFYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 EAGNWYG VLA AVQSVAAVIC SFVLA KVPNK YHKAGY FGCL ALGALGFFSV 

351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

4 01 ICMPQ IVASL LSFVLFPMLG GL QATMF LVG GWLLLGAFS VFLI KETHGG 

451 V* 

ORF16a and ORF16-1 show 99.6% identity in 451 aa overlap: 

10 20 30 40 50 60 

orfl6a.pep MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHSLGWFF 
I 1 ! I I I I I 1 I I I 1 I I 1 I ! 1 ! ! I i I I I I I 1 I 1 ! I I I I ! I I t i I ! I I I I I I I I M I : i i M I 
orf 16-1 MS E YT PQT AKQGL P ALAK S T I WML S FG FLGVQT AFT LQ S S QMS R I FQT LGAD PHN LG W F F 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 16a . pep ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 
I I I t It I I i I I t I ! I t I I t I I I I I I I I I II I I I II I I I t I I I t I I I I II II I II I i I I I I 
orf 16-1 ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 1 6a . pep LAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILP 

I I I I ! t I I I I I i I I I i M I 1 I f I i ( I M I f I M I I I I I I I I I I I i I ! I I I I I I i I I t I I I 
orf 16-1 LAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 16a . pep FVFAYIGLANTAEKGVVPQTVVVAFYVGAALLVITSAFTIFKVKEYNPETYARYHGIDVA 

II I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I II I I I I I I I I I I : I I I I ! M M I I I I 
orf 16-1 FV FAY I GLANT AEKGVVPQT VWAFYVGAALLV I T SAFT I FKVKE YD PET YARYHGI DVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 1 6a . pep ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 
I I I I I I II I t I I I I I I I I I I I I I I I I I I t I II I I I I I I I I I I I I I i I I I I I I I I II I I I I 
orf 16-1 ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 1 6a . pep EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 
I II I I I I I I M I I I I I I I I II I I II I I I I II t I I I II II I I I 1 I I I t I I I I I I I I I I I I I 
orf 16-1 EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

310 320 330 340 350 360 
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10 



370 380 390 400 410 420 

orfl6a pep LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 
| | M I I 1 I I I M I ! M ! t I I I I I I 1 M ! I I I I I 1 1 1 I I i t i 1 I i I I t I I I I t I I I i I I t i 
orfl6-l LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

370 380 390 400 410 420 

430 440 450 

or f 16a . pep GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 
I I I I I I 1 I 1 M I I I I I 1 1 I I I I I I M II 1 I I I 
or f 16-1 GLQATMFLVGGVVLLLGAFSVFLIKETHGGVX 

430 440 450 



Homology with a predicted ORF from N. gonorrhoeae 
15 ORF16 shows 93.9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from N. 



20 



25 



30 



gonorrhoeae: 

orf 16. pep 
orf 16ng 
orf 16 . pep 
orf 16ng 
orf 16 .pep 
orf 16ng 
orf 16 . pep 
orf 16ng 

The complete length ORF16ng nucleotide sequence <SEQ ID 153> is: 



GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 30 
|: f M It I I I I I I I I I I I I t t I I I t I I t t 

HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 131 

MILMPNSGSFG FG Y A S LAAL S FGALM I ALL D V S S NMAM QP FKMMVG DMVN EE QKX Y AY G I 90 
I I M I I I I I I I ! I I I I I I I I I I I 1 M I I II I I I M II I I II t I II I I I I II I ! I I I I I I 

MILMPNSGSFG FG Y AS L AAL S FGALM I ALL D V S S NMAMQ P FKMMVG DMVNE E Q K S Y A YG I 191 

Q S FL AN TG A WAA I L P FV FAY I G LAN T AXKG W PQT VWAF Y VG AALL V I T S A FT I FKVK 150 

I I I M I I I I I M I I I I I I t I II I I I II I I I I I I I 1 I I I I I I I ! I II : I I I I M I III 
QS FLANT DAVVAAI LPFVFAYIGLANTAEKGWPQT WVAFYVGAALL 1 1 TSAFT I SKVK 251 

EYXPETYARYHGIDVAANQEKANWIALLKXA 181 

II I I I I I I I I I I M I I I I I I M I : 111:1 

EYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWTVTPVQFFCWFAFRYMWTYSAGAI 311 



35 



40 



45 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGATAGGGG 
TACTTTTCAA 
CAAACAGCAA 
GTTGAGCTTC 
CGCAGATGAG 
GGCTGGTTTT 
AGTGGCTACT 
CCTGCCGTAT 
TGATGCCGAA 
TTGTCGTTCG 
GGCGATGCAG 
AGAAAAGCTA 
GTTGTGGCAG 
CACTGCCGAG 
TGGGTGCGGC 
AAAGAATACG 
CGCGAATCAG 
AAGTGTTTTG 
CGGTATATGT 
CACTACCGAT 
GCGTTTTGGC 



ATCGCCGCGC 
ATCAAAAAAA 
AACAAGGTTT 
GGCTATCTCG 
CCGCATTTTT 
TCATCCTGCC 
ACTCAGACCG 
CTGCTTTACG 
CTCGGGCAGC 
GCGCGCTGAT 
CCGTTTAAGA 
CGCCTACGGG 
CGATTCTGCC 
AAAGGCGTTG 
GTTACTGATT 
ACCCGGAAAC 
GAAAAAGCCA 
GACGGTTACT 
GGACTTACTC 
GCGTCTTCCG 
GGCGGTGTAG 



CGGCAACCAT 
AGGATTTACT 
GCCCGCGCCG 
GCGTTCAGAC 
CAAACGCTAG 
GCCGCTGGCG 
CACTTGGAAG 
GCACGCTGAT 
TTCGGTTTCG 
GATTGCGCTG 
TGATGGTCGG 
ATTCAAAGTT 
GTTTGTGTTC 
TGCCACAAAC 
ATTACCAGTG 
CTACGCCCGT 
ACTGGTTCGA 
CCGGTACAGT 
GGCAGGCGCG 
TAGGCCATCA 



TTCGGATTTT 
TTATGTCGGA 
GCAAAAAGCA 
GGCCTTTACC 
GCGCAGACCC 
GGGATGCTGG 
CCGCGCTTGG 
TGCGGTCATC 
GCTATGCGTC 
TTGGACGTGT 
CGATATGGTC 
TCTTAGCGAA 
GCGTATATCG 
CGTGGTCGTA 
CGTTCACAAT 
TACCACGGCA 
ACTCTTAAAA 
TTTTCTGCTG 
ATTGCAGAAA 
GGAGGCGGGC 



CCAAAGCAAA 
ATATACGCCT 
CGATTTGGAT 
CTGCAAAGCT 
GCACAATTTG 
TTCAGCCGAT 
GCGGCCGCCG 
GTGATGATTT 
GCTGGCGGCC 
CGTCGAATAT 
AACGAGGAGC 
TACGGACGCG 
GTTTGGCGAA 
GCATTCTATG 
CTCCAAAGTC 
TCGATGTCGC 
ACCGCGCCTA 
GTTCGCCTTC 
ACGTCTGGCA 
AACCGGTACG 



This encodes a protein having amino acid sequence <SEQ ID 154>: 



55 



60 



1 MIGDRRAGNH FGFSKANTFQ 

51 VELRLSRRSD GLYPAKLADE 

101 SGYYSDRTWK PRLGGRR LPY 

151 LSFGALMIAL LDV SSNMAMQ 

201 WAAILPFVF AYIGLA NTAE 

251 KEYDPETYAR YHGIDVAANQ 

301 RYMWTYSAGA IAENVWHTTD 



IKKKDLLYVG IYASNSKTRF 
PHFSNARRRP AQFGLVFHPA 
LLYGTLIAVI VMILMPNSGS 



ARAGKKHDLD 
AAGGDAGSAD 

FGFGYA SLAA 

P FKMMVG DM V NEEQKSYAYG I QS FLANT DA 
KGWPQT VW AFYVGAALLI ITSA FT I SKV 
EKANWFELLK TAPKVFWTVT PVQFFCWFAF 
ASSVGHQEAG NRYGVLAAV* 



CHIR-0160 (356.001) 



-153- 



PATENT 



ORF16ng and ORF16-1 show 89.3% identity in 261 aa overlap: 

30 40 50 60 70 80 

orf 16-1. pep MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPLAGMLVQPI-VGHYSDRT 

I : : I ! i II : I : I I I I t 

5 orfl6ng DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 

50 60 70 80 90 100 

90 100 110 120 130 140 

orf 16-1. pep WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
10 I I ( I f I I I i I I t I I I I I I M I I t I I I I I I I I I I I I t t t t I t I I I I I I I 1 I I I II t I I t I I 

orfl6ng WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
110 120 130 140 150 160 

150 160 170 180 190 200 

1 5 orf 1 6-1 . pep MQ P FKMMVG DMVN EEQKG YAYG I Q S FLANT G AWAAI L P FV FAY I GLANT AE KG VV PQT V 

1 I I I I I I II I I I I I I I I : I I I I I I I I I I I I I I I I I I i I I I I II I I I I I I I I I I i I I I I I 
orfl6ng MQPFKMMVGDMVNEEQKSYAYGIQSFLANTDAVVAAILPFVFAYIGLANTAEKGWPQTV 
170 180 190 200 210 220 

20 210 220 230 240 250 260 

orf 16-1 . pep VVAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVAANQEKANWIELLKTAPKAFWT 

I t I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I : II I II 1 I I : I I I 
orfl6ng VVAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 

230 240 250 260 270 280 

25 

270 280 290 300 310 320 

orf 16-1. pep VTLVQFFCWFAFQYMWTYSAGAIAEMVWHTTDASSVGYQEAGNWYGVLAAVQSVAAVICS 

II M I I I 11 I I : I I I I I I I ! I I II I II I I I II I I I I : I I I I I I I I I I I I 
orfl6ng VT PVQFFCW FAFRYMWT Y SAGAI AEN WJHTT DAS S VGHQE AGNRYGVLAAVX 

30 290 300 310 320 330 340 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



35 Example 19 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 155>: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

40 151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA . NTGACGG 

251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C.TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT . . . 

45 This corresponds to the amino acid sequence <SEQ ID 156; ORF28>: 

1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA XXTGILXAGL DKPFQIVXDT 

101 PSYXCHQALP VKLGSXGSQN. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 157>; 

50 1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

55 251 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 
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301 
351 
401 
451 
501 
551 
601 
651 
701 



CCGAGCTATG 
CAGCCAGAAT 
AGCCTGCCGA 
CTCGACAATC 
CTACGCCACA 
TGCCTGCCGA 
AAGCTGTTTG 
GGCGGGCGCG 
ATGCCGCCCG 



CTCGCCACCA 
TTCAGTACCG 
CATCGCCAAG 
GGACCATTTA 
CCGCAAAAAC 
TATTTATTAC 
CAAATATCTT 
GTACTGGCCT 
CAAATGA 



AGCCCTGCCG 
AAGGCCTTTG 
CTGAAACAGC 
CACGCGCTGC 
TGAACGCCGA 
ACGGTTACTG 
ATATACGCCC 
TGCCTGCGGC 



GTCAAACTCG 
CCTGCGCTAC 
TCGGGTTTGA 
GTATCCGCCA 
TTACCATTTT 
AAGAACATAC 
CCCTTTTTGA 
GGCTCTGGGT 



AATCGCCTGG 
GATACCGACA 
AGCGGTCAAA 
AAGGCAAATA 
GAGCAAAGTG 
CGACAAATCC 
TACTGGATGC 
GCGGTCGTGG 



10 This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 



15 



1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP 

51 VAEDNAQLEK GSLVMMGGKY WEWNPEDSA 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF 

201 KLFANILYTP PFLILDAAGA VLALPAAALG 



VSETITRKHV DKDQIRAFGV 
KLTGILKAGL DKPFQIVEDT 
DTDKPADIAK LKQLGFEAVK 
EQSVPADIYY TVTEEHTDKS 
AWDAARK* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A of N. 



meningitidis: 



20 



25 



30 



orf 28 .pep 
orf28a 

orf 28 -pep 
orf28a 

orf28a 



10 20 30 40 50 60 

MLFRKTTAAVLAHTLMLNG CTLMLWGMNNPV5ETITRKHVXKPQIRXFGVVAEPNAQLEK 
I | ! | I 1 I i I I i I I I ! I I I I I : I : 1 I I i - I I t I : I I I I I I I I t I I I I I I I I I I I t I 
MLFRKTTAA VLAATLMLNG CTVMMWGMNSPFSETTARKHVDKDQIRAFGVVAEDNAQLEK 

40 



10 



20 



30 



50 



60 



70 80 90 100 110 120 

GSLVMMGGKYWFVVNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 

I I I I I t I I I I I I I I I I I I I I MM M M I MM : I : M M M M I MM 
GSLVMMGGKYWFVVNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 
70 80 90 100 110 

FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
120 130 140 150 160 170 



The complete length ORF28a nucleotide sequence <SEQ ID 159> is: 



35 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGTTGTTCC 
GAACGGCTGT 
CGACCGCCCG 
GTTGCCGAAG 
CGGGAAATAC 
GCATTTTGAA 
CCGCGCTTTG 
CCAGAATTTC 
CTGCCGACAT 
GACAATCGGA 
CGCCACACCG 
CTGCCGATAT 
TTGTTTGAAA 
GGGCGCGGTG 
CCTCAGACAA 



GTAAAACGAC 
ACGGTAATGA 
CAAACACGTT 
ACAATGCCCA 
TGGTTCGTCG 
GGCCGGGTTG 
CCTACCAAGC 
AGTACCGAAG 
CGCCAAGCTG 
CCATTTACAC 
CAAAAACTGA 
TTATTACACG 
ATATTGCATA 
CTGGCCTTGC 
ATGA 



CGCCGCCGTT 
TGTGGGGTAT 
GACAAGGACC 
ATTGGAAAAG 
TCAATCCTGA 
GACAAGCAGT 
CCTGCCGGTC 
GCCTTTGCCT 
AAACAGCTTG 
GCGCTGCGTC 
ACGCCGATTA 
GTTACGAAAA 
TACGCCCACC 
CTGTCGCGGC 



TTGGCGGCAA 
GAACAGCCCG 
AAATCCGCGC 
GGCAGCCTGG 
AGATTCGGCG 
TTCAAATGGT 
AAACTCGAAT 
GCGCTACGAT 
AGTTTGAAGC 
TCCGCCAAAG 
TCATTTTGAG 
AACATACCGA 
ACGTTGATAC 
GTTGATTGCA 



CCTTGATGTT 
TTCAGCGAAA 
CTTCGGTGTG 
TGATGATGGG 
AAGCTGACGG 
TGAGCCCAAC 
CGCCCGCCAG 
AC CG AC AG AC 
GGTCGAACTC 
GCAAATACTA 
CAAAGTGTGC 
CAAATCCAAG 
TGGATGCGGT 
GCCACGAATT 



50 This encodes a protein having amino acid sequence <SEQ ID 160>: 

1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 

55 201 LFENIAYTPT TLILDAVGAV LALPVAALIA ATNSSDK* 



ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 
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10 20 30 40 50 60 

orf28a pep MLFRKTTAAVLAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 
| | 1 i | | | | | | | t f I I I I I I I I : 1 : I I I I : I i I I : I I I I I I I I I t I I I I I I I I I I I I I t 
orf28-l MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGWAEDNAQLEK 

10 20 30 40 50 60 

70 80 90 100 110 119 

orf 28a . pep GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 
I 1 | M I 1 I I { I I ! 1 I I 1 I I I I I I I I I I 1 I I I I 11:11 : I : I : I I I I I I I I M : I I I 
orf 28-1 GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 28a , pep FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
I | | I | i II I 1 I I t : I I I I M 1 I II I I I I : I I II I II I I I I I I I I I II I I I I I I I I I I I I 
orf 28-1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 
130 140 150 160 170 180 



180 190 200 210 220 230 

orf 28a - pep EQSVPADIYYTVTKKHTDKSKLFENIAYTPTTLILDAVGAVLALPVAALIAATNSSDKX 
I I I | 1 I I I I I I I I : : I I I I II I I II III I I I f I : I I I I I I I : I I I I : : : : : II 
orf 28-1 EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 
190 200 210 220 230 



Homology with a predicted ORF from N. gonorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from N. 
gonorrhoeae: 



orf 28 . pep MLFRKTTAAVLAHTLMLNGCTLMLWGMNNPVSETITRKHVXKDQIRXFGWAE DNAQLEK 60 

I I I I I I i I i I II I I : I I I I I : I I I I II I II: I I i I I I I Mill I I I I! I I I II I M 
orf2 8ng MLFRKTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGVVAEDNAQLEK 60 

orf 28. pep GSLVMMGGKYWFVVNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 120 

I I I I I I I I I I I I : I I I I I ! I I I : I I I I I I I I I I ! I I II I I I II I I I : : till 
orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 120 

The complete length ORF28ng nucleotide sequence <SEQ ID 16 1> is 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATACT 

51 GAACGGCTGT ACGATGATGT TGCGGGGGAT GAACAACCCG GTCAGCCAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGCCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCCTTTTGAA GGCCGGGTTG GACAAGCCCT TCCAAATAGT TGAGGATACC 

301 CCGAGCTATG CCCGCCACCA AGCCCTGCCG GTCAAATTCG AAGCGCCCGG 

351 CAGCCAGAAT TTCAGTACCG GAGGTCTTTG CCTGCGCTAT GATACCGGCA 

401 GACCTGACGA CATCGCCAAG CTGAAACAGC TTGAGTTTAA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACG CCGCAAAAAC TGAACGCCGA TTATCATTTT GAGCAAAGTG 

551 TGCCCGCCGA TATTTATTAT ACGGTTACTG AAAAACATAC CGACAAATCC 

601 AAGCTGTTTG GAAATATCTT ATATACGCCC CCCTTGTTGA TATTGGATGC 

651 GGCGGCCGCG GTGCTGGTCT TGCCTATGGC TCTGATTGCA GCCGCGAATT 

701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 162>: 



1 MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAE DNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP PLLILDAAAA VLVLPMALIA AANSSDK* 



ORF28ng and ORF28-1 share 90.0% identity in 231 aa overlap: 

10 20 30 40 50 60 

orf 28-1 . pep MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGVVAEDNAQLEK 
I M I I I I I I I I I I I I : I I I I I : I I I I I I I I I : I I i I I M M I I I I M M I I I M I I I I I 
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orf28ng MLFRKTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGWAEDNAQLEK 

10 20 30 40 50 60 

, 70 80 90 100 110 120 

orf 28-1 . pep G S LVMMGGK Y W FWN PE D S AKLT G I LKAGL DK P FQ I VE DT P S YARHQAL P VKLE S PG S QN 
I I I I I I I I I II I : I I I I II I I I I I : I I I I I I I I I 1 I I I I I I I I I I I I I I I I I : 1 : I I I I I 
orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 28-1 .pep FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 
Ml I 1 ft 1 I i I : i I I I I I II i I : I I I I I I I I I II i I I I I I I I I I I I t I I I II I I I I 
orf28ng FSTGGLCLRYDTGRPDDIAKLKQLEFKAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

orf 28-1 . pep EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAVVDAARKX 

I I I I I I I I I I I I I I : I I I I I I I I : I I I I I I I : I I I I I I : I I I : I I I : : I : 
orf28ng EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 

190 200 210 220 230 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-P AGE. Figure 
6 A shows the results of affinity purification of the GST-fusion protein, and Figure 6B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 
that ORF28-1 is a surface-exposed protein, and that it may be a useful immunogen. 



Example 20 



The following partial DNA sequence was identified mN, meningitidis <SEQ ID 163>: 

1 . . GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

51 TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT . . 

This corresponds to the amino acid sequence <SEQ ID 164; ORF29>; 



1 ..VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 

51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 

101 TKTSIVPQAP FSDRWLEENA GAASG, . 

Further work revealed the complete nucleotide sequence <SEQ ID 165>: 

1 ATGAATTTGC CTATTCAAAA ATT CAT GAT G CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 
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201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 

851 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 

901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 

951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 

1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT C GAT AC ACT A GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

1401 AGCAAAAGAT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

1451 GA 

This corresponds to the amino acid sequence <SEQ ID 166; ORF29-l>: 



1 MNLPIQKFMM LFAAAISLLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQTFDA TAVSPVLPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYSYYVKGTS TKTKTNIVPQ APFSDRWLKE NAGAASGFFS 

2 01 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

401 VHKTLTPNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

4 51 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A of N. 
meningitidis: 



10 20 30 

or f 2 9 . pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 

I : I : M I I I I t I I I I I : I I II I I I I I 1 I I I 
orf2 9a EPGGKYHLFGNARGSVKNRVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHE 
50 60 70 80 90 100 



40 50 60 70 80 90 

orf29.pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 
I I I i I t : I 1 I I I t 1 I I I I I I I I I I I I I I I I I I I i I i I I I Mill:: I I I I I I i M I i 
orf29a VHSFFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
110 120 130 140 150 160 



100 110 120 

or f 2 9. pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 
I II I I I I I I I : : I I I : I I I I I I I I : I I I I II I I 
orf2 9a XXYVKGTSTKTKSNIVPRAPFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANR 
170 180 190 200 210 220 



orf29a 



MDDIRGIVQGAVNPFLMGFQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLA 
230 240 250 260 270 280 
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The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 

1 ATGAATTNGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACATGGA 

301 CATGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTTCATCGGA 

401 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCTGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG AT GAT ATT CG CGGCATCGTC CAAGGTGCGG 

7 01 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

7 51 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 

951 TGCCCTTGCC GTAGCAGANG CCGCAACTAC GGTTTGGGGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC NGGCTATAAN 

1051 ACACCTGCTG TTCGCACCAT GCATACTTTG GATGGGGAAA TGGCCGGTGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCCAA CAGCAAAGCA GATGCTTCCA 

1151 CACAACCGTC TTTACAAGCG CAACTAATTG GAGAACAAAT TANNNNNGGG 

1201 CATGCTTATA ACAAGCATGT CATAAGACAA CAAGAATTTA CGGATTTAAA 

1251 TATCAATTCA CCAGCAGATT TTGCTCGGCA TATTGAAAAT ATTGTTAGCC 

1301 ATCCANCAAA TATGAAAGAG TTACCTCGCG GTAGAACTGC GTATTGGGAT 

1351 NATAAAACAG GGACNATAGT TATCCGAGAT AAAAATTCTG ACGATGGAGG 

14 01 TACAGCATTT AGACCAACAT CAGGTAAAAA ATATTATGAT GAT T TAT AG 

This encodes a protein having amino acid sequence <SEQ ID 168>: 

1 MNXPIQKFMM LFAAAISXLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVYAVQTFDA TAVGPILPIT HERTGFEGII GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYXXYVKGTS TKTKSNIVPR APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDIRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGXN HLGXLSPEAQ LAAATALQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAXAATTVWG GKKVELNPTK WDWVKNTGYX 

351 TPAVRTMHTL DGEMAGGNRP PKSITSNSKA DASTQPSLQA QLIGEQIXXG 

401 HAYNKHVIRQ QEFTDLNINS PADFARHIEN IVSHPXNMKE LPRGRTAYWD 

451 XKTGTIVIRD KNSDDGGTAF RPTSGKKYYD DL* 

ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 

10 20 30 40 50 60 

MNXPIQKFiyiMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
11 ! I f I 1 i I I I I I I I I I I i I II M I I I I I I I I 1 i I I ! I I II t i I I I I I ! I ! I I II I I : 
MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

RVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
I I I I I I I I I I I I I : I : M I II I I I 1 I I I : I I I I I I I I I M I M I I I 1 1 : I I M I I I I 1 I I 
RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 
70 80 90 100 110 120 

130 140 150 160 170 180 

GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTSTKTKSNIVPR 



130 140 150 160 170 180 

190 200 210 220 230 240 

APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 
I I I I I I I I II I I ! I I I I I I I II I i I I I I I I I I I I I I I I I I I I I I I : I I I I I I I II I I I I I 
APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 



orf 29a .pep 
orf29-l 

orf 29a. pep 
orf29-l 

orf 29a. pep 
orf29-l 

orf 29a .pep 
orf2 9-l 
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190 



200 



210 



220 



230 



240 



250 260 ' 270 280 290 300 

or f 2 9a pep FQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLAAATALQDSAFAVKDGINS 
I I | I I I I I I I I I I I M I I I I I 1 I I I I I I I I i I I I I I I I I I I : I N I I ! I I i I i i t i 
orf 29-1 FOG VG I G A I T D S AV S P VT DT AAQQT LQG I N DLGKL S PE AQLAAAS LLQ D SA FAVKD G INS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 2 9a pep ARQWADAHPNITATAQTALAVAXAATTVWGGKKVELNPTKWDWVKNTGYXTPAVRTMHTL 
I : I I I I j I I I I ! I I I I I I I : : i II III I I I I I II i I I I M t I I 1 I I IM \'W 
or f 2 9- 1 AKQW A DAH PN I T AT AQT AL S AAE AAGT VWRGKKVE LN PT KWD W VKNT G YKK P AARHMQT L 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 29a. pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

I I I I i I I I : I i I : lit: I 
orf 2 9-1 DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 

370 380 390 400 410 

Homology with a predicted ORF from N. gonorrhoeae 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from N. 
gonorrhoeae: 

30 



VSPVLPITHERTGFEGVIGYETHFSGHGHE 
I : 1 : I I i I M I I I I I I I I I I II I II I I I I I 
EPGGKYHLFGNARGSVKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 102 



VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 

[ | M I I : I I II I I I I I II M I I I I I I I I I I 1 I I I I I I I I Mill:: I I I I 1 I I I I I I 
VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 



orf 2 9. pep 
orf29ng 
orf 29. pep 
orf29ng 
orf 29. pep 
orf29ng 

The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 
having amino acid sequence <SEQ ID 170>: 



90 
162 
125 



SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 
I I : : I I I I ! I I I : f I I II I I I I II : 1 I II I I I i 

SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 222 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 



MNLPIQKFMM LFAAAISLLQ IPISHANGLD 



FGNARGSVKN 
HEVHSPFDNH 
GYPPPGGARD 
RADEAGKLIW 
DSAVSPVTYA 
ARQWADAHPN 
KP AARHMQT V 
YHGFPQSVDA 
DGKINHRLFV 



RVCAVQTFDA 
DSKSTSDFSG 
IYSYHIKGTS 
ENDPDKNWRA 
AARKTLQGIH 
I TAT AQT ALA 
DGEMAGGNKP 
FSENGTVIQI 
PNQQLPEK* 



TAVGPILPIT 
GVDGGFTVYQ 
TKTKINTVPQ 
NRMDDIRGIV 
NLGNLSPEAQ 
VTEAATTVWG 
LESKNTVTTN 
VGGDNIVRHK 



ARLRDDMQAK 
HERTGFEGVI 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLTG 
LAAATALQDS 
GKKVELN PAK 
NFFENTGYTE 
LYIPGSYKGK 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGG 
NAGAASGFLS 
FQGLGVGAIT 
AFAVKDSINS 
WDWVKNTGYK 
KVLRQASNGD 
DGNFEYIREA 



In a second experiment, the following DNA sequence <SEQ ID 171> was identified: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



atgAATTTGC 
gatgctGCat 
GCGATGATAT 
TTTGGTAATG 
ATTTGATGCA 
CAGGATTTGA 
CACGAAGTAC 
TTTCAGCGGC 
CAGGGTCGGA 
GGTTATCCGG 
AGGAACTTCA 
CAGACCGCTG 
CGTGCGGATG 



CTATTCAAAA 
ATCCCCATTA 
GCAGGCAAAA 
CTCGCGGCAG 
ACTGCGGTCG 
AGGTGTTATC 
ACAGTCCGTT 
GGCGTAGACG 
AATACATCCC 
AACCACAAGG 
ACCAAAACAA 
GCTAAAAGAA 
AAGCAGGAAA 



ATTCATGATG 
GTCATGCGAA 
CACTACGAAC 
TGTTAAAAAT 
GCCCCATACT 
GGCTATGAAA 
CGATAATCAT 
GCGGTTTTAC 
GCAGACGGAT 
GGC AAGGGAT 
AGATAAACAC 
AATGCCGGTG 
ACTGATATGG 



ctgttggcAg 
CGGTTTGGAT 
CGGGTGGCAA 
CGGGTTTGCG 
GCCTATTACA 
CCCATTTTTC 
GATTCAAAAA 
CGTTTACCAA 
ATGACGGGCC 
ATATACAGCT 
TGTTCCGCAA 
CCGCTTCCGG 
GAAAACGACC 



cggcaatatc 
GCCCGTTTGC 
ATACCATCTG 
CCGTCCAAAC 
CACGAACGGA 
AGGACACGGA 
GCACTTCTGA 
CTTCATCGGA 
TCAAGGCGGC 
ACCATATCAA 
GCCCCTTTTT 
TTTTCTCAGC 
CCGATAAAAA 
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651 TTGGCGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAACGGGT TTTCAAGGGG TAGGGATTGG GGCAATTACA 

751 GACAGTGCGG TAAGCCCGGT CACAGATACA GCCGCTCAGC AG ACT CT AC A 

801 AGGTATTAAT GATTTAGGAA ATTTAAGTCC GGAAGCACAA CTTGCCGCCG 

851 CGAGCCTATT ACAGGACAGT GCCTTTGCGG TAAAAGACGG CATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACAGCAA CAGCCCAAAC 

951 TGCCCTTGCC GTAGCAGAGG CCGCAGGTAC GGTTTGGCGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC CGGCTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTGTA GATGGGGAGA TGGCAGGGGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCGGA AGGAAAAGCT AATGCTGCAA 

1151 CCTATCCTAA GTTGGTTAAT CAGCTAAATG AGCAAAACTT AAATAACATT 

1201 GCGGCTCAAG ATCCAAGATT GAGTCTAGCT ATTCATGAGG GTAAAAAAAA 

1251 TTTTCCAATA GGAACTGCAA CTTATGAAGA GGCAGATAGA CTAGGTAAAA 

1301 TTTGGGTTGG TGAGGGTGCA AGACAAACTA GTGGAGGCGG ATGGTTAAGT 

1351 AGAGATGGCA CTCGACAATA TCGGCCACCA ACAGAAAAAA AATCACAATT 

14 01 TGCAACTACA GGTATTCAAG CAAATTTTGA AACTTATACT ATTGATTCAA 

14 51 ATGAAAAAAG AAATAAAATT AAAAATGGAC ATTTAAATAT TAGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 172; ORF29ng-l>: 

1 MNLPIQKFMM LLAAAISMLH IPI5HA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP ADGYDGPQGG 

151 GYPEPQGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGNLSPEAQ LAAASLLQDS AFAVKDGINS 

• 301 ARQWADAHPN ITATAQTALA VAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNRP PKSITSEGKA NAATYPKLVN QLNEQNLNNI 

401 AAQDPRLSLA IHEGKKNFPI GTATYEEADR LGKIWVGEGA RQTSGGGWLS 

451 RDGTRQYRPP TEKKSQFATT GIQANFETYT IDSNEKRNKI KNGHLNIR* 

ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 

10 20 30 40 50 60 

orf29ng-l.pep MNLPIQKFMMLLAAAISMLHIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
i | i I I I I I I t i : i t I I i : I : I I I I i I M I I I I I I i I II II I t ! II I I I I I I I I I I i I i I : 
orf29-l MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf2 9ng-l.pep RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
II I I II I I I I I I : I : I II 1 I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I :! I I I I I I I I I I 
or f 2 9-1 RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf29ng-l.pep GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 
I I I I I I I I I I I ( I I i I I I I I I I I I II I I : II I I I I I I I I I : : I I 1 I I I I I I III 
orf29-l GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 9ng-l . pep APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 
I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I : I I : I I I I I I I I I : I I I I II I I I I M I 
orf 2 9-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf29ng-l.pep FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 
I I I I I I I I I I I I M I I t I I I I I I I I I I I I I II I : I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 2 9-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 2 9ng-l . pep ARQW ADAH PN I T AT AQT ALAVAE AAGT VWRGKKVE LN PTKW DWVKNT G YKK P AARHMQT V 
I : 1 I I I I I I I I I I I I! I I I : : I I I I 1 I I I M I I II I I I I I I I I I I I I I I I I I f f ! I I M : 
orf 2 9-1 AKQWADAHPNITAT AQT ALSAAE AAGT VWRGKKVE LNPTKWDWVKNTGYKKPAARHMQTL 

310 320 330 340 350 360 



370 



380 390 400 410 419 
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orf 29ng-l . pep DGEMAGGNRPPKSI-TSEGKANAATYPKLVNQLNEQNLNNIAAQDPRLSLAIHEGKKNFP 

i | | 1 | 1 | | : | | I : : I : : : : | : : : : : : : : : 

orf 29-1 DGEMAGGNKPIKSLPNSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVKT 

370 380 390 400 410 420 

^ 420 430 440 450 460 470 479 

orf 29ng-l . pep IGTATYEEADRLGKIWVGEGARQTSGGGWLSRDGTRQYRPPTEKKSQFATTGIQANFETY 

orf 2 9-1 RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 
IQ 430 440 450 460 470 480 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

15 Example 21 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 173>: 

• S Q 1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

s : i 51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 

Ji? 101 ACACGCGGGC AGATGCACCG ATGCAG . . . 

y 20 This corresponds to the amino acid sequence <SEQ ID 174; ORF30>: 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ . . 

W Further work revealed the complete nucleotide sequence <SEQ ID 175>: 

Q 1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

^ 25 101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

m 151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

Q 201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

"Z, 251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGC AATT 

"=f 301 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

30 351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

401 GAACAGGTCA TCCTATTGGA AAATTTCCCC ATT AT CAT CG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG AC AG AT CAT G GAAAAACCGC TTCTAA 



>: 



This corresponds to the amino acid sequence <SEQ ID 176; ORF30-1 

35 . 1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQ1AELSQKE 

51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 

40 Homology with a predicted ORF from N, meningitidis (strain A) 

ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A ofN. 
meningitidis; 

10 20 30 40 

orf 30 . pep MKKQ I TAAVMMLSM I APAMA NG L DNQAFE DQM FHTRADAPMQ 
45 I I I I I I I I i I I I I I I I i i I i I I I I I I I I I I I : I I I I t ( I M I 

orf 30a MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTX GAFLP 

10 20 30 40 50 60 



orf30a 



LX I LGGAAIGMW TQHGFSYATTGRPASVRDVAI AGGLGAI PGXVGAAGKWS FAKYGRE I 
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70 80 90 100 110 120 

The complete length ORF30a nucleotide sequence <SEQ ID 177> is: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGANA CAGNGGGGGC GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGNTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

4 01 GAACAGGTCA TCCTATTGGN AAATTTCCCC ATTATCATCG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 178>: 

1 ' MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQLAELSQKE 
51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 
101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 
151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 

or f 30a . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 

I It I t I t I t I t II I I ! M I I t I I I I t I I M I ! I I I I t I I I I I I I I I II M I I I I I I I I 
orf 30-1 MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 

orf 30a. pep LXILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGXVGAAGKWSFAKYGREI 120 

I I 1 I I I I I II I I I I I I I I 1 I I I I I I M M I I I I I I M I I I I I I I I M I I M I I I I I I 1 

orf 30-1 LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKVVSFAKYGREI 120 

orf 30a . pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 

II I i I i i I I I I I I I I I I i I I I I I I I I I I I I I I I II I I II I I I I II I I I I I I I I I M I I I I 

orf 30-1 KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 

orf 30a. pep FX 
I I 

orf30-l FX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from N. 
gonorrhoeae: 

orf 30 . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 42 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I : I I I I I I I I I I 
orf30ng MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 

The complete length ORF30ng nucleotide sequence <SEQ ED 179> is 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATCGCCCC 

51 CGCAATGGCA AACGGATTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCGCCG ATGCAGTTGG CGGAGCTTTC TCAGAAGGAG 

151 ATGAAGGAGA CTGAAGGGGC TTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTGGCG GATTAGGCGC AATTCCTGGT 

301 GATGTAGGTG CTGCAGGAAA GGTTGTTTCC TTTGCTAAAT ATGGACGTGA 

351 GATTAAAATC GGCAATAATA TGCGGATAGC CCCTTTCGGT AATAGAACAG 

4 01 GTCATCCTAT TGGAAAATTT CCCCATTATC ATCGTCGAGT TACGGATAAT 

4 51 ACGGGCAAGA CTTTGCCTGG ACAGGGAATT GGTCGTCATC GCCCTTGGGA 

501 ATCAAAATCT ACGGACAGAT CATGGAAAAA CCGCTTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 180>: 



1 MKKQITAAVM MLSMIAPAM A NGLDNQAFED QVFHTRADAP MQLAELSQKE 
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51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 
101 DVGAAGKWS FAKYGREIKI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 
151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap: 



10 20 30 40 50 60 

orf 30ng . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 
I II I 1 I I I I I I II I 1 I I I I I I II I I I I I I I i I I I I I I I I I I I II I I M I t I ! ! i I f I i I I 
orf 3 0-1 MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

10 20 30 40 50 60 



70 80 90 100 110 

orf 30ng . pep LAILGGAAIGMWTQHGFSYATTGRPASVRDVA-- GGLGA IPG DVGAAGKWS FAKYGRE I 
I i I II I I II I I I I I I I I I I I I I I t I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 30-1 LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKVVSFAKYGREI 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 30ng . pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 
I I I II I I I I I I I I I I I f I I II I I I I I II I M I I II I I I I I I I I I I I I I I II II II II I I I 
orf 30-1 KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 
130 140 150 160 170 180 

180 

orf30ng.pep FX 
I I 

orf30-l FX 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 22 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 181>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTg . CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

201 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT. . 

This corresponds to the amino acid sequence <SEQ ID 182; ORF31>: 



1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI . , 

Further work revealed a further partial nucleotide sequence <SEQ ID 183>: 



1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT. . 

This corresponds to the amino acid sequence <SEQ ID 184; ORF31-l>: 



1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI . . 



Computer analysis of this amino acid sequence gave the following results: 



CHIR-0160 (356.001) PATENT 

-164- 



Homology with a predicted QRF from N. gonorrhoeae 

ORF31 shows 76.2% identity over a 84aa overlap with a predicted ORF (ORF31.ng) from N. 



MNKTLYRVIFNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 

M | | M | | | I M I I f I I I I i I I i I I I I 1 I I I i I I I I : I I i I I i I : : I 

MNKTLYRVIFNRKRGAVVAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAF 54 

SFSLLGFSLCLAVGTXNIAFADGI 84 

II I I I I I I i I: I ! ! f I I I I ! f 
CFSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSV 114 

The complete length ORF31ng nucleotide sequence <SEQ ID 185> is: 

1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa cGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

351 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

401 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

4 51 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 

701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 

7 51 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 

1 MNKTLYRVIF NRKRGAVVAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARWVN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

This gonococcal protein shares 50% identity over a 149aa overlap with the pore-forming 
hemolysins-like HecA protein from Erwinia chrysanthemi (accession number L39897); 



orf 31ng 


96 


GNGIPQVNIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 


154 






GNG+P VNI TP -f+G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 




HecA 


45 


GNGVPVVNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 


104 


Orf31ng 


155 


ARVWNQINSSHPSQLNGYIEVGGRRAEVVIANPAGIAVNGGGFINASRATLTTGQPQYQ 


214 






A ++N++ S + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 




HecA 


105 


AAAILNEWSPNRSRLAGYLEVAGQAANVWANPYGITCSGCGFLNTPRLTLTTGTPQFD 


164 


Orf31ng 


215 


-AGD FSGFKIRQGNAVI AGHGLDARDT DF 242 








AG SG +R G+ +1 G GLDA +D+ 




HecA 


165 


AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 





Furthermore, ORF31ng and ORF31-1 show 79.5% identity in 83 aa overlap: 

10 20 30 40 50 60 

orf 31-1. pep MNKTLYRVIFNRKRGAVVAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCRSNIFS 

I I I I I I i I I I I I I I I I I I I 1 I I I I I I I I f II I I I ! I I : : I I I I I II | : I 

orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAFC 

10 20 30 40 50 



gonorrhoeae: 

orf 31. pep 
orf 31ng 
orf 31 .pep 
orf 31ng 



70 80 
orf 31-1 .pep FSLLGFSLCLAVGTANIAFADGI 



• 
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I I I t t II I I I : I I : I I 1 t M I I 
orf31ng FSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSVN 

60 70 80 90 100 110 

On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
5 the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 23 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 187>: 



10 



1 ATGAATACTC 

51 TTTCGGCGAC 

101 AACTCGGTTG 

151 GCGCTTTGCC 

201 TGTCCGCACT 



CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 
ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 
GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 
CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 
TGGCATTCCG ATGCGGCAGA TAT T GAT AC C GCG. . 



This corresponds to the amino acid sequence <SEQ ID 188; ORF32>: 



15 



i 

51 



MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 



Further work revealed the complete nucleotide sequence <SEQ ID 189>: 



20 



25 



30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGAATACTC 
TTTCGGCGAC 
AACTCGGTTG 
GCGCTTTGCC 
TGTCCGCACT 
CCGATGTCGT 
CACATTATCC 
CGCGGAGGAA 
GTGTTCAAAA 
TTGATACGCG 
CCTGCGAGAG 
TTTTCGGCTA 
CAGGCAGGCA 
CAGCCTCAAA 
GCGATGTTTT 
CCGCAACAGG 
CCGCGGCGAA 
TTTGGCACAT 
GCCTTTTGGG 
ACACCGCCGT 
CACAACGCCT 
CGGCAAGGCG 
TCCTGAAAAA 



CTCCTTTTGT 
ATCGGCGTTT 
GCAGGTGCAT 
CTGATTTGCC 
TGGCATTCCG 
CATCGAAACT 
GCCGACACAA 
AGCAATGAAA 
ATATTTTTGG 
AACGTGATTA 
CGGCTGATGC 
TCGGAGCGAT 
GCCCGATGAC 
CAAAGCGGCG 
TCAGACGGCA 
ACTTCGACCA 
GACAGTTTCG 
CTACCCGCAA 
ATAAGGCACA 
CTTTCGGACG 
CGAATGTTGG 
CGGAGGATTG 
CTCGCTGCCT 



CTGTTGGATT 
CGTGGCGGCT 
TTGTGGACGG 
CGATGTTCCC 
ATGCGGCAGA 
TTTGCCTGCG 
GCCGCTTTGG 
GGCTGCATCT 
TTTATGGGTT 
CTGCGAAGCC 
TGCCCGAAAA 
GTTTGGGCAA 
ACTGTTGCTG 
TTATTCCGCA 
TCCGTCCGCC 
ACTGCTGCAC 
TGCGCGCCCA 
GACGAGAATG 
CGGTTTCTAC 
ACCTCAACGG 
CAAACCCTGC 
GAGCCGTTAT 
TTGTTTCAAA 



TTTTGCAAGG 
CGCCCGTGTT 
ACGATGTGTC 
TGCGTTCATC 
TATTGATACC 
ACCTGCCCGA 
CTGAATTGGG 
GATGCCTTCG 
TCAGCGAAAA 
GTCCGTTTCG 
AAACGCCTCC 
AGTGGCTGGA 
GCGGGGACGC 
AGATGCCCTG 
TCGTCAAAAT 
CTTGCCGACT 
GCTTGCGGGC 
TCCATCTCGA 
ACGCCCGAAA 
CGGAGAGGCT 
AACAACATCA 
CTTTTCGGGC 
GCATCAAAAA 



TCATCGACAA 
TTGCACCGCG 
CGCCTTGCGT 
AGGATATTCA 
GCGCCTGTTC 
AAATGTGCTG 
AATATTTGAG 
CCGCAGGAGG 
AAGCGGCGGG 
ATACTGAAGC 
GAATGGCTGC 
AATGTGGCGA 
AAATCATCGA 
CAAAACGACG 
CCCTTTCGTG 
GCGCCGTCAT 
AAACCCTTCT 
CAAACTCCAC 
CCGTGTCGGC 
TTATCCGCAA 
AAACGGCTGG 
AGCCGTCAGC 
ATACGCTAG 



This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l>: 



45 



1 MNTPPFVCWI 

51 ALCPDLPDVP 

101 HIIRRHKPLW 

151 LIRERDYCEA 

201 QAGSPMTLLL 

251 PQQDFDQLLH 

301 AFWDKAHGFY 

351 RQGAEDWSRY 



FCKVIDNFGD 
CVHQDIHVRT 
LNWEYLSAEE 
VRFDTEALRE 
AGTQIIDSLK 
LADCAVIRGE 
TPETVSAHRR 
LFGQPSAPEK 



IGVSWRLARV 
WHSDAADIDT 
SNERLHLMPS 
RLMLPEKNAS 
QSGVIPQDAL 
DSFVRAQLAG 
LSDDLNGGEA 
LAAFVSKHQK 



LHRELGWQVH 
APVPDWIET 
PQEGVQKYFW 
EWLLFGYRSD 
QNDGDVFQTA 
KPFFWHIYPQ 
LSATQRLECW 
IR*w 



LWTDDVSALR 
FACDLPENVL 
FMGFSEKSGG 
VWAKWLEMWR 
SVRLVKIPFV 
DENVHLDKLH 
QTLQQHQNGW 



50 Computer analysis of this amino acid sequence gave the following results: 



t 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf32 pep ^tppfvcwifckvidnfgdigvswrlarvlhrelgwqvhlwtddvsalralcpdlpdvp 

MINI 1 t i I ! i I i II ! i I i I i I I I I I I I I t I I I I I I I I I I t M I I I I t I I I I t I 
orf32a MNTPPFSAGXFCKVIDNFGDIGVSWRIARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 
10 20 30 40 50 60 

70 80 
orf 32 . pep CVHQDIHVRTWHSDAADIDTA 

! I I! I I I I I I I I I i I I I 1 1 I I 
orf 32a CVHQDIHVRTWHSDAADI DTAPVXDVVIETFACDLPENVLHI IRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 



15 The complete length ORF32a nucleotide sequence <SEQ ID 191> is: 



20 



25 



30 



35 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGAATACTC 
TTTCGGCGAC 
AACTCGGTTG 
GCGCTTTGCC 
TGTCCGCACT 
NCGATGTCGT 
CACATCATCC 
CGCGGAGGAN 
GTGTTCNAAA 
CTGATACGCG 
CTTGCGCAAG 
TTTTCGGCTA 
CAGGCAGGCA 
CAGCCTCAAA 
GCGATGTTTT 
CCGCAACAGG 
CCGCGGCGAA 
TTTGGCACAT 
GCCTTTTGGG 
ACACCGCCGC 
CACAACGCCT 
CGGCAAGGCG 
ATCCGAAAAA 



CTCCTTTTTC 
ATCGGCGTTT 
GCAGGTGCAT 
CTGATTTGCC 
TGGCATTCCG 
CATCGAAACT 
GCCGACACAA 
AGCAATGAAA 
ATANTTTTGG 
AACGCGATTA 
AGGCTGATGC 
TCGGAGCGAT 
GTCCGTTGAC 
CAAAACGGCG 
TCAGACGGCA 
ACTTCGACAA 
GACAGTTTCG 
CTACCCGCAA 
ATAAGGCACA 
CTTTCAGACG 
CGAATGTTGG 
CGGAGGATTG 
CTCGCCGCCT 



TGCTGGANTT 
CGTGGCGGCT 
TTGTGGACGG 
CGATGTTCNC 
ATGCGGCAGA 
TTTGCCTGCG 
GCCGCTTTGG 
GGCTGCACNT 
TTTATGGGTT 
CTGCGAAGCC 
TTCCCGAAAA 
GTTTGGGCAA 
ACTTTTGCTG 
TTATTCCGCA 
TCCGTCCGCC 
ACTGCTGCAC 
TGCGCGCCCA 
GATGAGAATG 
CGGTTTCTAC 
ACCTCAACGG 
CAAATCCTGC 
GAGCCGTTAT 
TTGTTTCAAA 



TTTTGCAAGG 
TGCCCGTGTT 
ACGATGTGTC 
TGCGTTCATC 
TATTGATACC 
ACCTGCCCGA 
CTGAANTGGG 
GATGCCTTCG 
TCAGCGAANN 
GTCCGTTTCG 
AAACGNCCCC 
AGTGGCTGGA 
GCNGGGGCGC 
AGATGCCCTG 
TCGTCAAAAT 
CTTGCCGACT 
GCTTGCGGGC 
TCCATCTCGA 
ACGCCCGAAA 
CGGAGAGGCT 
AACAACATCA 
CTTTTTGGGC 
GCATCAAAAA 



TCATCGACAA 
TTGCACCGCG 
CGCCTTGCGT 
AGGATATTCA 
GCGCCTGTTC 
AAATGTGCTG 
AATATTTGAG 
CCGCAGGAGA 
NAGCGGCGGA 
ATAGCGGAGC 
GAATGGCTGC 
AATGTGGCGA 
ANATTATCGA 
CAAAACGACG 
CCCTTTCGTG 
GCGCCGTCAT 
AAACCCTTCT 
CAAACTCCAC 
CCGCATCGGC 
TTATCCGCAA 
AAACGGCTGG 
AGCCTTCCGC 
ATACGCTAG 



This encodes a protein having amino acid sequence <SEQ ID 192>: 



40 



45 



LWTDDVSALR 
FACDLPENVL 
FMGFSEXSGG 
VWAKWLEMWR 
SVRLVKIPFV 
DENVHLDKLH 
QILQQHQNGW 



50 



55 



60 



1 MNTPPFSAGX FCKVIDNFGD IGVSWRLARV LHRELGWQVH 

51 ALCPDLPDVX CVHQDIHVRT WHSDAADIDT APVXDWIET 

101 HIIRRHKPLW LXWEYLSAEX SNERLHXMPS PQESVXKXFW 

151 LIRERDYCEA VRFDSGALRK RLMLPEKNXP EWLLFGYRSD 

201 QAGSPLTLLL AGAXIIDSLK QNGVIPQDAL QNDGDVFQTA 

251 PQQDFDKLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ 

301 AFWDKAHGFY TPETASAHRR LSDDLNGGEA LSATQRLECW 

351 RQGAEDWSRY LFGQPSASEK LAAFVSKHQK IR* 

ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 

10 20 30 40 50 60 

orf 32-1. pep MNT P P FVCW I FCKV I DN FG D I G V S WRL ARVLHRE LGWQ VH LWT D D V S ALRALC P DL P DV P 

I I I t I I II t I ( I I I I I I I I I I I I I I I I ( t I II t II i II M I I I I I I I II I I I I t I 

orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 32-1. pep CVHQDIHVRTWHSDAADIDTAPVPDVVIETFACDLPENVLHIIRRHKPLWLNWEYLSAEE 

I I I I I I I I II I I I I I M II I I I I I I I II I II I I I M I I I I I I I I I I I I I I II I I | M 
orf 32a CVHQDIHVRTWHSDAADI DTAPVXDWIETFACDLPENVLHI IRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 32-1 Pep SNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNAS 

| | | | 1 | | | | | I I : I 1 I I II II I I I I M 11 1 I II I I I I I I I • 1 1 I • I I I I i I t i 
orf 32 a SNERLHXMPSPQESVXKXFWFMGFSEXSGGLIRERDYCEAVRFDSGALRKRLMLPEKNXP 
5 130 140 150 160 170 180 

190 200 210 220 230 240 

orf32-l pep EWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQTA 
millimilllllllllllllhllMII: | | | I I I I : i I 1 I II I t 1 I I I I I I I I I 
1 0 orf 32a EWLLFGYRSDVWAKWLEMWRQAGSPLTLLLAGAXIIDSLKQNGVIPQDALQNDGDVFQTA 

AU 190 200 210 220 230 240 

250 260 270 280 290 300 

orf 32-1 pep SVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

15 ' I III II I III II I II 1:1 III IMMII MM I II III I II III II I I III M I m II I 

orf 32a SVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
250 260 270 280 290 300 

310 320 330 340 350 360 

20 orf 32-1 pep AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 

I | | | | | | I 1 I I 1 I I : I I I I I I I I I I M I I M M I I M I II I M M M I 11 M M M It i 
orf 32a AFWDKAHGFYTPETASAHRRLSDDLNGGEALSATQRLECWQILQQHQNGWRQGAEDWSRY 

310 320 330 340 350 360 

25 370 380 

orf 32-1 .pep LFGQPSAPEKLAAFVSKHQKIRX 

II I I II I II I I II M I I II I M 
orf 32a LFGQPSASEKLAAFVSKHQKIRX 

370 380 

30 

Homology with a predicted ORF from N. gonorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N. 
gonorrhoeae: 

orf 32 pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 

35 III I I I M II I 1 I I I I II I II I I I M I I II I I I I I II II M I I I I I I I I I I I 1 I I 

orf32ng MVMNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 60 

orf 32 . pep DVPCVHQDIHVRTWHSDAADI DTA 81 

III II II I M I M I I M II 1 II I 

40 orf32ng DVPFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLS 120 

An ORF32ng nucleotide sequence <SEQ ID 193> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 

1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

45 101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGDVWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

50 Further sequencing revealed the following DNA sequence <SEQ ID 195>: 

1 ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC ATCAGGATAT 

55 201 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT hCCGCGCCCG 

251 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 

301 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 

351 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 

4 01 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 

60 4 51 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 



# 
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501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



AGCCCTGCGC 
TGCTTTTCGG 
CAACAGGCAG 
CGACAGCCTC 
aaggcgGTGT 
GTGCcGCAAC 
GATACGCGGC 
TTTTTTGGCA 
CACGCCTTTT 
GGTGCACCGC 
CAACACAACG 
TGGCGGCAAG 
CGCATCCGAA 
AG 



CGGCGGCTGG 
CTATCGGGGC 
GCAGCCTGAT 
AAACAAAGCG 
CTTTCagacG 
AGGAcTTCGA 
GAAGACAGTT 
CATCTACCCG 
GGGATAAGGC 
CTCCTTTCGG 
CCTCGAATGT 
GCGCGGAGGA 
AAACTCGCCG 



TGCTGCCCGA 
GATGTTTGGG 
GACCCTACTG 
GCGTTATTCC 
gcatccgTcC 
CAAATTGCTG 
TCGTGCGTAC 
CAAGACGAGA 
ATACGGCTTC 
ACGACCTCAA 
TGGCAAACCC 
TTGGAGCCGT 
CCTTTGTTTC 



AAAAAACGCC 
CAAAGTGGCT 
CTGGCGGGGG 
GCAAAACGCC 
gccttGTCAA 
CAcctcgcCG 
CCAGCTTGCC 
ATGTCCATCT 
TACACGCCCG 
CGGCGGAGAG 
TGCAACAACA 
TATCTTTTCG 
AAAGCATCAA 



CCCGAATGGC 
GGACATGTGG 
CGCAAATTAT 
CTGCAAAAtg 
AAtcCCGTTC 
ACTGCGCCGT 
GGAAAACCCT 
CGACAAACTC 
AAACCGCATC 
GCTTTATCCG 
TCAAAACGGC 
GGCAGCCTTC 
AAAATACGCT 



15 This encodes a protein having amino acid sequence <SEQ ID 196; ORF32ng-l>: 



20 



i 

51 
101 
151 
201 
251 
301 
351 



MNTYAFPVCW 
RALCPDLPDV 
LNIIRRHKPL 
GLIRERDYRE 
QQAGSLMTLL 
VPQQDFDKLL 
HAFWDKAYGF 
WRQGAEDWSR 



IFCKVIDNFG 
PFVHQDIHVR 
WLNWEYLSAE 
AVRFDTEALR 
LAGAQIIDSL 
HLADCAVIRG 
YTPETASVHR 
YLFGQPSASE 



DIGVSWRLAR 
TWHSDAADID 
ESNERLHLMP 
RRLVLPEKNA 
KQSGVIPQNA 
EDSFVRTQLA 
LLSDDLNGGE 
KLAAFVSKHQ 



VLHRELGWQV 
TAPVPDAVIE 
SPQEGVQKYF 
PEWLLFGYRG 
LQNEGGVFQT 
GKPFFWHIYP 
ALSATQRLEC 
KIR* 



HLWTDDVSAL 
TFACDLPENV 
WFMGFSEKSG 
DVWAKWLDMW 
ASVRLVKIPF 
QDENVHLDKL 
WQTLQQHQNG 



ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap: 



25 



30 



35 



40 



10 20 30 40 50 59 

orf 32-1. pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
lit I I i I I I I I I M I t I I I I I I M t I I I I I I I i I M I I I I I t II I I I I t I t I I I I I ! 
orf32ng-l MNTYAFPVCW I FCKV I DN FGD I G VS WRLARVLHRE LGWQVHLWT D DVS ALRALC PD L PD V 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 32-1. pep PCVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAE 
I IIMIIIIIjllllilliliilll:MIillllMIMi:IIIIIili!II!I!MII 
orf32ng-l PFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLSAE 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 32-1 . pep ESNERLHLMP SPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II M I I I I : I I : I I M I I 
orf32ng~l ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 
130 140 150 160 170 180 



45 



180 190 200 210 220 230 239 

orf 32-1. pep SEWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQT 
M I I I I t ! : I I I I I t I : I I : I I II I I I I I I I : 1 I I I I I I I I I I I I I : I I I I : I Nil 
orf32ng-l PEWLLFGYRGDVWAKWLDMWQQAGSLMTLLLAGAQIIDSLKQSGVIPQNALQNEGGVFQT 
190 200 210 220 230 240 



50 



55 



60 



65 



240 250 260 270 280 290 299 

orf 32-1. pep ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKL 
II I I I I I II I I 1 I II I I : I I I M M I I I I I I I I I I I : 1 I I II I II I I I I I i I II II I I i I 
orf32ng-l ASVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRTQLAGKPFFWHIYPQDEMVHLDKL 
250 260 270 280 290 300 

300 310 320 330 340 350 359 

or f 32 -1 . pep HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
I I I I 1 I I : I I I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf32ng-l HAFWDKAYGFYTPETASVHRLLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
310 320 330 340 350 360 

360 370 380 

orf 32-1 . pep YLFGQPSAPEKLAAFVSKHQKIRX 
I I I II II I I I I I I I I I I I II 1 I I 
orf32ng-l YLFGQPSASEKLAAFVSKHQKIRX 
370 380 
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On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
7A shows the results of affinity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST-fusion in E.colu Purified His-fusion protein was used to immunise 
mice, whose sera were used for ELISA, giving a positive result. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 24 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 197>: 

1 . . TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA. ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

201 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

251 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

401 TCGCCTGCTA NGGCATCCTG CCGCGCCTG. . 

This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 

1 . . LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRWKIXATSH 
51 SLWLCTLLGM LVSVLLLLLV RQYTFNWEST LLSNAASVRA VEMLAWLPSK 
101 LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL. . 

Further work revealed the complete nucleotide sequence <SEQ ID 199>: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGACGA 

51 AGGCGGTTTT ATTTTCAGCG GCGATCCCGT ACAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGGAGATG 

151 ATTGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

2 01 GTCGTTCTGG TTGTGGGTGG TGGCGGCGAC GTTTGCATTT TTTACCGGTT 

251 TTTCAGTCAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGTTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGTG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GACCCTGTAA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCA ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGCC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTGCTGG CTTGGGTAGT 

801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGATTGGAT TTGGAAAAGC 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCACCGAAAA TCATCTTGAA 

951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAGTGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCG CAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 ACCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCG GACCGCGGCG 

1151 TGTTGCGGCA GATTGTCCGA CTCTCGGAAG CGGCGCAGGG CGGCGCGGTG 
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1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGGCCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGC GGCGCAGGAA GGGCGTTTGA AAGACCAATA A 

This corresponds to the amino acid sequence <SEQ ID 200; ORF33-l>: 

1 MLNPSRKLVE LVRILDEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAEM 

51 IDRNRMLRET LERVRAGS FW LWWAATFAF FTGFS VTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRQPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

251 GSIACYGILP RLLA WWCKI LLKTSENGLD LEKPYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIILNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 TNREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALAECGAA WLEPDRAAQE GRLKDQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A of N. 
meningitidis: 

10 20 30 

orf 33 .pep LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 

I I I I I I ( I I ! I I ! I I I I I I i M I i I I I I ! 
orf 33a LMDNQGLN FFLVLAGVXGMNTLMLAVW LAMLFLRVKVGRFFSSPATWFRGKDPVMQAVLR 
90 100 110 120 130 140 



40 50 60 70 80 90 

orf 33 . pep LYXDEWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLSNAASVRA 

II I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I :::: I I I 
orf 33a LYADEWRXPSVRWKIGATSH5LW LCTLLGMLVSVLLLLLV RQYTFNWESTLLGDSSSVRL 
150 160 170 180 190 200 



100 110 120 130 140 

orf 33 . pep VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSG LLVXSIACXGILPRL 
I I I I I I I I : I I I I I II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
orf 3 3a VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLAW AVCK 
210 220 230 240 250 260 



orf 33a ILXXTSENGLDLEKXXXXXXIRRWQNKITDADTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 300 310 320 

The complete length ORF33a nucleotide sequence <SEQ ID 201 > is: 



1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGAAGA 

51 AGGCGGCTTT ATTTTCAGCG GCGATCCCGT GCAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGAAGATG 

151 ATCGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

201 GTCGTTCTGG TTGTGGGTGG CGGCGGCGAC GTTTGCGTTT NTTACCGNTT 

251 TTTCAGTTAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGNTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGCG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

4 01 CGACGTGGTT TCGGGGCAAA GACCCTGTCA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCN ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGGGCGATT CGTCTTCGGT ACGGCTGGTG GAAATGTTGG CATGGCTGCC 

651 TGCGAAACTG GGTTTTCCCG TGCCTGATGC GCGGGCGGTC ATCGAAGGTC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGCGGT 

801 ATGCAAAATC CTTNTGNAAA CAAGCGAAAA CGGCTTGGAT TTGGAAAAGC 

851 NCNNNNNTCN NNCGNTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCCGAAAA TCGTCTTGAA 

951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAATGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 
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1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCC GACCGCGGCG 

1151 TGTTGCGGCA GATCGTCCGA CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCANCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTGGAAC 

1301 CCGACAGAGC GGCGCAGGAA GGCCGTCTGA AAACCAACGA CCGCACTTGA 

This encodes a protein having amino acid sequence <SEQ ID 202>: 



1 MLNPSRKLVE 

51 IDRNRMLRET 

101 VLAGVXGMNT 

151 YADEWRXPSV 

201 LGDSSSVRLV 

251 GSIACYGILP 

301 DTRRE TVS AV 

351 ANREQVAALE 

401 VXLLAEQGLS 



LVRILEEGGF 
LERVRAGSFW 
LMLAVWLAML 



IFSGDPVQAT 
LWVAAATFAF 



RWKIGATSHS 
EMLAWLPAKL 
RLLAWAVCKI 



SPKIVLNDAP 
TELKQKPAQL 
DDLSEKLEHW 



EALRRVDGST 
XTXFSVTYLL 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



GFPVPDARAV 
LXXTSENGLD 
KWAVMLETEW 
LIGVRAQTVP 
RNALTECGAA 



IEGRLNGNIA 
LEKXXXXXXI 
QDGEWFEGRL 
DRGVLRQIVR 
WLEPDRAAQE 



EEKIIRRAKM 
MDNQGLNFFL 
DPVNQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKTNDRT* 



ORF33a and ORF33-1 show 94,1% identity in 444 aa overlap: 



10 20 30 40 50 60 

orf 33a . pep MLNPSRKLVELVRILEEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAKMIDRNRMLRET 
I I II M I I I II i I I I : I I I I I ! II I ! ! t I I I I I I II I ! 1 I i 1 I I II M : I M I M I II I ! 
orf 33-1 MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 33a . pep LERVRAGSFWLWVAAATFAFXTXFSVTYLLMDNQGLNFFLVLAGVXGMNTLMLAVWLAML 
I I I I I I I I I I I I I : II II I I I I I I I M I I I 1 I I I I I I I II I 11 I I I I I I I I II I I I I 
orf 33-1 LERVRAGSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNT LMLAVWLAML 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 33a. pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHSLWLCTLLGML 
I I M I I I I I I I I I I I I I I I I I M I I M II I I I I I I I I I I I M I I I I I I I I I II I I I M I 
orf 33-1 FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 33a . pep VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAVIEGRLNGNIA 
I I I M I I I I I M I I I I I I I I I :::: II I I I I I I I I I : I M I I I I I I I I I I 1 I I I I I I I I 
orf 33-1 VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 33a . pep DARAWSGLLVGSIACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXIRRWQNKITDA 
I I I I I I I I I I I I I I I I I I I I I I I I I : I I I II I I I I I I I I I I I I M I I I II I I 

orf 33-1 DARAWSGLLVGSIACYGILPRLLAWVVCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 33a . pep DTRRETVSAVSPKIVLNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 
I I I I I M I I I I I I I : I I I I I I I I I M I I I I I I I I I I I I II I M I I I I I I I : I I I I I I I II 
orf 33-1 DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 3 3a . pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAVVXLLAEQGLSDDLSEKLEHW 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I 
orf 33-1 TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 

370 380 390 400 410 420 



430 440 450 

orf 33a . pep RNALTECGAAWLE PDRAAQEGRLKTNDRTX 

I I I I : II I I I I I I I I I I I I I I I I I 
o r f 3 3 - 1 RN ALAECGAAWLE P DRAAQEGRLKDQX 

430 440 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from AT. 
gonorrhoeae: 



orf33.pep 


LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 
! 1 1 1 1 i f 1 1 II i 1 I i 1 i 1 1 1 1 1 1 II II 1 
LMDNQGLNFFLVLAGVLGMNTLMLAVWLATLFLRVKVGRFFSSPATWFRGKGPVNQAVLR 


30 


orf 33ng 


100 


orf 33 . pep 


LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 


90 


orf 33ng 


II i : M II 1 1 i 1 II : 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 i 1 1 M 1 1 1 1 M t 1 t 1 1 ! II 1 1 
LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 


160 


orf33.pep 


VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 
1 M 1 1 1 I M 1 1 1 1 1 1 II II : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : I 1 1 1 1 1 1 
VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWVVCK 


143 


orf 33ng 


220 



An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 
acid sequence <SEQ ID 204>: 



1 MIDRDRMLRD TLBRVRAGS F WLWWVASMM FTAGFS GTYL LMDNQGLNFF 

51 LVLAGVLGMN TLMLAVW LAT LFLRVKVGRF FSSPATWFRG KGPVNQAVLR 

101 LYADQWRQPS VRWKIGATAH SLW LCTLLGM LVSVLLLLLV RQYTFNWEST 

151 LLSNAASVRA VEMLA WLPSK LGFPVPDARA VIEGRLNGNI ADARAWSGLL 

201 VGSIVCYGIL PRLLAWWCK ILLKTSENGL DLEKTYYQAV IRRWQNKITD 

251 ADTRRETVSA VSPKIVLNDA PKWALMLETE WQDGQWFEGR LAQEWLDKGV 

301 AANREQVAAL ETELKQKPAQ LLIGVRAQTV PDRGVLRQIV RLSEAAQGGA 

351 VVQLLAEQGL SDDLSEKLEH WRNALTECGA AWLEPDRVAQ EGRLKDQ* 

Further sequence analysis revealed the following DNA sequence <SEQ ID 205>: 



1 ATGTTGaatC CATCCCgaAA 

51 agggggtTTT attttcagcg 

101 gccgcgtgga cggcAGTACG 

151 atcgACAGGg accgtatgtt 

201 gtcgtTctgG TTATGGGTGG 

251 TTTCAGgcac ttatCttCTG 

301 GTTTTggcgG GAGTGTtggG 

351 gGCAACGTTG TTCCTGCGCG 

4 01 CGACGTGGTT TCGGGGCAAA 

4 51 TATGCGGACC AGTGGCGGCA 

501 GGCGCACAGC TTGTGGCTCT 

551 TGCTGCTGCT TTTGGTGCGG 

601 TTGAGCAATG CCGCTTCGGT 

651 GTCGAAACTC GGTTTCCCTG 

701 GTCTGAACGG CAATATTGCC 

751 GGCAGTATCG TCTGCTACGG 

801 GTGTAAAATC CTTTTGAAAA 

851 CCTATTATCA GGCGGTCATC 

901 GATACGCGTC GGGAAACCGT 

951 CGATGCGCCG AAATGGGCGC 

1001 AATGGTTCGA GGGCAGGCTG 

1051 GCCAATCGGG AACAGGTTGC 

1101 GGCGCAACTG CTTATCGGCG 

1151 TGCTGCGGCA GATTGTGCGG 

1201 GTGCAGCTTT TGGCGGAACA 

1251 GGAACATTGG CGTAACGCGC 

1301 CTGACAGGGT GGCGCAGGAA 

This encodes a protein having amino acid 



ACTGgttgag ctGgTCCgtA Ttttgaataa 
gcgatcctgt gcaggcgacg gaggctttgc 
GAggAaaaaa tcttccgtcg GGCGGAGAtg 
gcgggACaCg TtggaacGTG TGCGTGCggg 
TggtggCAtC gATGATGTtt aCCGCCGGAT 
ATGGACaatC AGGGGCtGAA TtTCTTTTTA 
CATGaatacG ctgATGCTGG CAGTATGGtt 
TGAAAGTGGG ACGGTTTTTC AGCAGTCCGG 
GGCCCTGTAA ATCAGGCGGT GTTGCGGCTG 
ACCTTCGGTA CGATGGAAAA TAGGCGCAAC 
GCACGCTGCT CGGAATGCTG GTGTCGGTAT 
CAATATACGT TCAACTGGGA AAGCACGCTG 
ACGCGCGGTG GAAATGTTGG CATGGCTGCC 
TCCCCGATGC GCGGGCGGTC ATCGAAGGTC 
GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 
CATCCTGCCG CGCCTCTTGG CTTGGGTAGT 
CAAGCGAAAA CGGattgGAT TTGGAAAAAA 
CGCCGCTGGC AGAACAAAAT CACCGATGCG 
GTCCGCCGTT TCGCcgaAAA TCGTCTTGAA 
TCATGCTGGA GACCGAGTGG CAGGACGGCC 
GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 
CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 
TACGCGCCCA AACTGTGCCG GACCGGGGCG 
CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 
GGGGCTTTCA GACGACCTTT CGGAAAAGCT 
TGACCGAATG CGGCGCGGCG TGGCTTGAGC 
GGCCGTTTGA AAGACCAATA A 

sequence <SEQ ID 206; ORF33ng-l>: 



1 MLNPSRKLVE LVRILNKGGF IFSGDPVQAT EALRRVDGST EEKIFRRAEM 

51 IDRDRMLRDT LERVRAGS FW LWVWASMMF TAGFS GTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAV WLATL FLRVKVGRFF SSPATWFRGK GPVNQAVLRL 

151 YADQWRQPSV RWKIGATAHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 
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251 GSIVCYGILP RLLA WWCKI LLKTSENGLD LEKTYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWALMLETEW QDGQWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

4 01 VQLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRVAQE GRLKDQ* 

5 ORF33ng-l and ORF33-1 show 94.6% identity in 446 aa overlap: 

10 20 30 40 50 60 

MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 
I I I I I I I 1 M I I I 1 I : : t M I 1 I I I M I I 1 I I I I I I I ! ! I M I I * ! M ! I I I I : 1 1 I I f 
MLNPSRKLVELVRILNKGGFIFSGDPVQATEALRRVDGSTEEKIFRRAEMIDRDRMLRDT 
10 20 30 40 50 60 

70 80 90 100 110 120 

LERVRAGSFWLWVVAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 

M I I i i I I I I ! II t : i : : 1 'III I I I I I I I t II t I I I I t I I I I I t I I I It I I I It I 
LERVRAGSFWLWWVASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 
70 80 90 100 110 120 

130 140 150 160 170 180 

FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 
I I I I I I I t I I I I I I I I I I I I I I ! I I II I I I i I : I 1 I I M I I I I I I I : I I I I I I I II 1 I I 
FLRVKVGRFFSSPATWFRGKGPVNQAVLRLYADQWRQPSVRWKIGATAHSLWLCTLLGML 
130 140 150 160 170 180 

190 200 210 220 230 240 

VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

I If II I I I t I I I I I I I I I I II I I I I I I I I I I t I I I I II I t I I I I I i I I I I I t I I 1 It I I I 
VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 
190 200 210 220 230 240 

250 260 270 280 290 300 

DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 
I I II I I I M I I I I : II M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I 
DARAWSGLLVGSIVCYGILPRLLAWVVCKILLKTSENGLDLEKTYYQAVIRRWQNKITDA 
250 260 270 280 290 300 

310 320 330 340 350 360 

DTRRETVSAVSPKXILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 
I t (I I t I II t I I t I : I I 1 I I I t I : t I I t t I I I I : I I I t I I I i I I M M I I : I I I It i It I 
DTRRETVSAVSPKIVLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 
310 320 330 340 350 360 

370 380 390 400 410 420 

TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

I I I I II I I I I 1 I I I I I II I I I I I M I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I 
TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390. 400 410 420 

430 440 
RNALAECGAAWLEPDRAAQEGRLKDQX 

II I I : I I I I I I I II t I : I I I I I t I I I I 
RNALTECGAAWLE PDRVAQEGRLKDQX 

430 440 



Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
55 predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 25 



The following partial DNA sequence was identified in N, meningitidis <SEQ ED 207>: 

1 . . CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 



10 
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30 



M 35 
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orf33-l.pep 
orf33ng~l 



orf 33-1 .pep 
15 orf33ng-l 



orf 33-1 . pep 
orf 33ng-l 



25 orf 33-1. pep 

orf 33ng-l 



orf 33-1 .pep 
orf 33ng-l 

orf 33-1 .pep 
orf33ng-l 



orf 33-1 .pep 
45 orf33ng-l 



orf 33-1 .pep 
orf 33ng-l 
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51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT.GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC.GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG..GTTTGA 

251 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGTT CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

' 451 GTCC. . 

This corresponds to the amino acid sequence <SEQ ID 208; ORF34>: 

1 . .QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 

51 GSTGVSLSVF SACVXGWRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS 

101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS 

151 S.. 

Further work revealed the complete nucleotide sequence <SEQ ID 209>: 

1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

301 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 

351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

4 01 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 

4 51 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

701 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 

7 51 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

851 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 

901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 

951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 

1 MMMPFIMLPW IAGVPA VPGQ NRLSR ISLWG LGGVFFGVSG LVW FSLG VSL 

51 GCACFSGVSF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 

201 SLKGLFGFFA ILIVLL GCRA MPSEGGSDGI AESALDWLV EGDDFLYADG 

251 GADFLGNLRL FFGGEDAHNV GYVAVGNDFD ARLCGGADAQ QRGADFGCVP 

301 SVAGDVAGSA RQGGDGNIW HAFGGLFGTC NLTDELFFAF GGDLSEQQQV 

351 AWADDGDLG R VAFGLWLA QIGTGGGF DT QRHNVWGLR AGGSAVDGGF 

4 01 RADGGASDYC ADAAAKGKAE NGGNQGADGV RFGFHRVLPF LGVSDGIALR 

451 HAV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 



ORF34 shows 73.3% identity over a 161 aa overlap with an ORF (ORF34a) from strain A of N, 
meningitidis: 
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10 20 30 

QKSLSR ISLWGLGGVFFGVSGLV WFSLG VSXE CAC 

M III I I I I I I I I I I I I I I I I I M I I I t III 
MMXPXIMLPWIAGVPAV PGQKRLSR XSLWGLGGXFFGVSGLVW FSLG VSXSLGVSXGCAC 
TO 20 30 40 50 60 

40 50 60 70 80 90 
FSGV SFRGSGRG TFVG5TGVSL5VFSACV XGWRLPVGLSCVGRLXX LTRFFLGA 

I I I I M M I I I I I I I I I I I I I II I I I II : ):::):: III I 1 I 

FSGV SFRGSGRG TFVGSTGVSLSVFSACA PASSGCLSVXAVSAGCGLTRXFXGA 

70 80 90 100 110 

100 110 120 130 140 150 

AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 
(II | | I i I I I I M I I : I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I : MM 
AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 



S 

PFGXKVLTMPIANAPMAVIQMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 

The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 

1 ATGATGATNC CGTTNATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAG AAGAGGTTGT CGAGAANTTC TTTATGGGGT TTAGGCGGCN 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTNTT' 

151 TCTTTGGGTG TTTCTNTGGG CTGTGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA CGGGGGACGT TTGTGGGCAG TACNGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGCTCCG GCGTCGTCCG GCTGCCTGTC GGTTTNAGCT 

301 GTGTCGGCAG GTTGCGGTTT GACCCGGNTT TTCTTNGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CGTCTGTGCC GTCCGGCTGT GCGGGTGCGG 

401 ATGAGGAGGC GTNGTNGTGT TCGGGTTGGG CGGCATCTTG TCCGACTACG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTNTGG AGGGTTTTGT CNCCGTTCGG GTNGAATGTG CTGACGATGC 

551 CTATTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCNGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTNGGTAGA GGGTGATGAC 

751 TTTTTGTACG CCGACGGTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACGTTGCC GTAGGTAACG 

851 ATTTTGACGC GCGCCTGTGT GGCGGGGCTG ATGCCCAACA GCGTGGCGCG 

901 GACTTTGGAT GTGTTCCAAG TGTCGCCGGC GATGTCGCCG GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TANTTGTACA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTC TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACAACGGAG ATTTGGGGCG 

1101 TGTANCCTTT GGTTTGGTTG TTTTGGCGCA GATAGGAGCG GGCGGTGGTT 

1151 TCGATACGCA GCGCCATTAC GTTGTCGTCG GTTNGCGCGC CGGTGGTTCG 

1201 GCGGTCGACG GCGGATTTCG CGCCGACCGC CGCGCCGCCG ACGACTGCGC 

1251 TGACGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAGT CAGGGTGCGG 

1301 ACGGTGTGCG GTTTGGGTTT CATCGGGTGC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 212>: 

1 MMXPXIMLPW IAGVPA VPGQ KRLSR XSLWG LGGXFFGVSG LV WFSLG V5X 

51 SLGVSXGCAC FSGV SFRGSG RG TFVGSTGV SLSVFSACA P ASSGCLSVXA 

101 VSAGCGLTRX FXGAAGDGSP LPLSSVPSGC AGADEEAXXC SGWAASCPTT 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGXNV LTMPIANAPM AVIQMSNTAR 

201 IRSL GVSLKG LFXFFAILIV LL GCRAMPSE GGSDGIAESA LDVVXVEGDD 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYVA VGNDFDARLC GGADAQQRGA 

301 DFGCVPSVAG DVAGSARQGG DGNVXVHAFG GLFGTCNLTD ELFLAFGGDL 

351 SEQQQVAWA DNGDLGR VXF GLWLAQIGA GGGF PTQRHY WVGXRAGGS 

4 01 AVDGGFRADR RAADDCADAA AEGKAEDGGS QGADGVRFGF HRVLPFLGVS 

4 51 DGIALRHAV* 



orf 34 . pep 
orf 34a 

orf 34 .pep 
orf 34a 

orf 34 .pep 
orf 34a 

orf 34 . pep 
orf34a 



ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 
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10 20 30 40 50 60 

orf34a pep MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 
|| | | | || Ml III III! 1:1111 Mill I I IMIMMMMIM III) 

or f 34-1 MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAG 

5 10 20 30 40 50 

70 80 90 100 110 120 

orf34a pep FSGVSFRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRXFXGAAGDGSP 
I || M M M M M I M M I M i M M I M M M I M M I M M M M M 1 M M M M 
in orf 34-1 FSGVS FRGSGRGT EVGSTGVSL S VFSACVPAS SGCLS VXAVSAGCGLTRFFLGAAGDGS P 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf34a pep LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 
15 M M I M M M I : M I I I I I I M M I I M 1 M I M M M M M I M M M M I M 

orf 34-1 LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
120 130 140 150 160 170 

190 200 210 220 230 240 

20 orf 34a. pep LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 

M M M M It I M M M I M M M M M M M I I M M M M M M M M M M M M I 
orf 34-1 LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
180 190 200 210 220 230 

25 250 260 270 280 290 300 

orf 34a . pep LD VVX VEG D D FL YADGG AD FLGNLRL FFGGE DAHN VG YVAVGN D FDARLCGG ADAQQRGA 

MM I I ! I! ! I I ! I I I I II I I I I M I ! I M I I I I M M I I I I I I I I I I I I I I II I I 1 I I 
orf 34-1 LDVVLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
240 250 260 270 280 290 

30 

310 320 330 340 350 360 

orf 34a . pep DFGCVPSVAGDVAGSARQGGDGNVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 
I II M I M M M I M M I I M M : M M M M M I M M M I : M M M M I M I M I I 
orf 34-1 DFGCVPSVAGDVAGSARQGGDGNIVVHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAVVA 
35 300 310 320 330 340 350 

370 380 390 400 410 420 

orf 34a . pep DNGDLGRVXFGLWLAQIGAGGGFDTQRHYVVVGXRAGGSAVDGGFRADRRAADDCADAA 
I :\ M I M I M M M M M M M M M I MM M II M I I) M M i Ml DIM 
40 orf 34-1 DDGDLGRVAFGLVVLAQIGTGGGFDTQRHNWVGLRAGGSAVDGGFRADGGASDYCADAA 

360 370 380 390 400 410 

430 440 450 460 

orf 34a .pep AEGKAEDGGSQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
45 M M M :\ M i M M I I t M I I M I I I I M I M M I I i I I 

orf 34-1 AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
420 430 440 450 

Homology with a predicted ORF from N. gonorrhoeae 
50 ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from K 
gonorrhoeae: 

orf 34 .pep QKSLSRISLWGLGGVFFGVSGLVWFSLGVSXE CAC 35 

I I I I I I M I t M M M M I M M I I I M I I I I 

orf34ng MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 60 

orf 34 .pep FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV GRLXXLTRFFLGA 90 

M II M M M I : M I M M M M I II II :,'!:(: | | I I I I II II 

orf 34ng FSGVSFRGSGWGAFVGSTGVSLSVFSACVP VPVNESAARAASEGR--GLTRFFLGA 114 



55 



60 orf 34 .pep AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 150 

11 I M 1 M M II I M I M I I I 1 1 I 11 1 M M I: M II M M M I M 1 M M : MM 
orf34ng AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 174 



65 



orf34.pep S 175 
orf 34ng PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 
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The complete length ORF34ng nucleotide sequence <SEQ ID 213> is: 

1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAA AAGAGGTTGT CGAGAATCTC TTTATGGGGT TTGGCCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTT 

151 TCTTTGGGTG TTTCTTTGGG CTGCGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA TGGGGGGCGT TTGTGGGCAG TACGGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGTTCCG GTGCCGGTTA ACGAATCGGC TGCCCGGGCC 

301 GCATCCGAAG GGCGCGGTTT gACCCGGTTT TTCTTGGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CTTCTGTGCC GTCCGGCTGT GCGGGTTCGG 

401 ATGAGGCGGC GTGGTGGTGT TCGGGTTGGG CGGCATCTTG TCCGACGGCG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTTTGG AGGGTTTTGT CGCCGTTCGG GTTGAATGTG CTGACGATGC 

551 CTACTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCGGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTTGGTAGA GGGTAATGAC 

751 TTTTTGTACG CCGAcggTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACATTGCC GTAGGTAATG 

851 ATTTTGACGC GCGCCTGTGT AGCGGGGCTG ATGCCCAGCA GcgtgGCGCG 

901 GACTTTGGAC GTGTTCCAAG TGTCGCCGGC GATGTCGCCC GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TAGTTGTATA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTT TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACGACGGAG ATTTGGGGCG 

1101 TGTAGCCTTT GGTTTGGTTG TTTTGGCGCA GGTAGGAACG GGCGGTGGTT 

1151 TCGATACGCA ACGCCATAAC GTtgtCATCG GTTtgcgcgc CGGTGGTTcg 

1201 gCGGTCGATG ACGGATTTTG CGCCGACGGC GGCCCCGCCG ACGACTGCGC 

1251 TGAAGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAAT CAGGGTGCGG 

1301 ACGGTGTGTG GTTTGGGTTT CATCGGGGAC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 214>: 

1 MMMPFIMLP W IAGVPAVPGO KRL5R ISLWG LAGVFFGV5G LV WFSLGVSF 

51 SLGVSLGCAC FSGV SFRGSG WG AFVGSTGV SLSVFSACV P VPVNESAARA 

101 ASEGRGLTRF FLGAAGDGSP LPLSSVPSGC AGSDEAAWWC SGWAASCPTA 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGLNV LTMPTANAPM AVIQMSNTAR 

2 01 IRSLG VSLKG LFGFFAILIV LL GCRAMPSE GGSDGIAESA LDVVLVEGND 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYIA VGNDFDARLC SGADAQQRGA 

301 DFGRVPSVAG DVARSARQGG DGNVWYAFG GLFGTCNLTD ELFFAFGGDL 

351 SEQQQVAWA DDGDLGR VAF GLWLAQVGT GGGF DTQRHN VVIGLRAGGS 

401 AVDDGFCADG GPADDCAEAA AEGKAEDGGN QGADGVWFGF HRGLPFLGVS 

451 DGIALRHAV* 

ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 

10 20 30 40 4 50 

MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVS LGCAC 

I I I I t i ( M I t I I ! i I I I I I : I I II i I i I I I : I I I H I I t I I I I I I I I I Mill 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVS LGCAC 
10 20 30 40 50 60 

60 70 80 90 100 110 

FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
I II I II I I I I I : I I I I I I I I I I II I I I I I : ■ : : I : I I I I II I M I I I I I I II 
FSGVSFRGSGWGAFVGSTGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 
70 80 90 100 110 120 

120 130 140 150 160 170 

LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
I M I I M I I I I I M I I I I I I I I M I II I I : I I I I I I I I I I I It I I I I I : I I I (I | | I I I 
LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 
130 140 150 160 170 180 

180 190 200 210 220 230 

LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
Mil I I M I I : I I II I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I H U I | | M 
LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
190 200 210 220 230 240 



orf 34-1 -pep 
orf 34ng 

orf 34-1 . pep 
orf 34ng 

orf 34-1 .pep 
orf 34ng 

orf 34-1 . pep 
orf 34ng 
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240 250 260 270 280 290 

nrf 34 -1 Deo LD WL VEG D D FLY ADGGADFLGN LRL FFGGE DAHNVGYVAVGN D FDARLCGGADAQQRGA 

* p p Miiiiit:iiiiiitiiiiiiniiniimMiii:iiimiini:miiim 

orf34na LDWLVEGNDFLYADGGADFLGNLRLFFGGEDAHNVGYIAVGNDFDARLCSGADAQQRGA 
° 9 250 260 270 280 290 300 

300 310 320 330 340 350 

orf34-l vev DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
Ml HI Ml 111 MiMIII!:M:lltliiimtlillittllttlMtlltlHI 
n-rf^Ana DF GRVPSVAGDVARSARQGGDGNWVYAFGGLFGTCNLTDELFFAFGGDLSEQGQVAVVA 
9 310 320 330 340 350 360 

360 370 380 390 400 410 

orf34-l pep D DGDLGRVAFG L VVL AQ IGTGGG FDTQRHN WVGLRAGGS AVDGG FRADGGAS D YC ADAA 
lilHIMMMMMMIIMIIIMMMMMMIMIII II MM ■ MM) 
orf34ng DDGDLGRVAFGLWLAQVGTGGGFDTQRHNWIGLRAGGSAVDDGFCADGGPADDCAEAA 
370 380 390 400 410 420 

420 430 440 450 

orf3 4-l pep AKGKAEN GGN QGADGVR FG FHRVL P FLGV S DG I ALRHAVX 
I : I I I I : I I I I I M II I M M M II M I I I M M I M I 
orf34ng AEGKAEDGGNQGADGVW FG FHRGLP FLGVS DG I ALRHAVX 

430 440 450 460 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and TV. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 26 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 21 5>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGATT . CAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGTAAAAAA GAAATCGTCT TCGGCACGAC 

151 CGTCGGCGAC TTCGGCGATA TGGTCAAAGA ACAAATCCAA GCCGAGCTGG 

201 AGAAAAAAGG CTACACCGTC AAACTGGTCG AGTTTACCGA CTATGTACGC 

251 CCGAATCTGG CATTGGCTGA GGGCGAGTTG 

This corresponds to the amino acid sequence <SEQ ID 216; ORF4>: 



1 MKTFFKTLSA AALALILAAC G . QKDSAPAA SASAAADNGA AKKEIVFGTT 
51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GEL 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ED 21 7>: 



1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

7 01 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 
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801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 
851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ID 218; ORF4-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDS APAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA VVNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

orf 4 . pep MKT FFKT L S AAALAL I LAA CG- QK PS AFAAS AS AAADNGAAKKE I VFGTT VG D FG DMVKE 

I 11 I M 1 M ! I I I I I I M I I I I I ! II I I i I M I I M I I 1 I II I I M I M II M I I I I ! 
orf 4a MKTFFKTLSAAALALILAA CGGQKDSAPAASASAAADNGAAXKEIVFGTTVGDFGDMVKE 

10 20 30 40 50 60 



60 70 80 90 

orf 4 . pep QIQAELEKKGYTVKLVEFTDYVRPNLALAEGEL 
11 11 I I I I I 1 I I I M Mill I I I 11 I I 11 
orf 4a XIQPELEKKGYTVKLVEXTDYVRXNLALAEGELDINVXQHXXYLDDXKKXHNLDITXVXQ 

70 80 90 100 110 120 



orf 4 a VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

The complete length ORF4a nucleotide sequence <SEQ ID 21 9> is: 



1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CANATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

301 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 

351 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTNGAT 

501 CAAACTCAAA GACNGCATCA NNNNGNNGNN NNNANCNANA NNNGANANNN 

551 NNNNANNNNT NNNNNNNNNN NNNNNCNNCG NNNNNNNANN NNNNNNNNNN 

601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 

651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

7 01 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 220>: 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AXKEIVFGTT 

51 VGDFGDMVKE XIQPELEKKG YTVKLVEXTD YVRXNLALAE GELDINVXQH 

101 XXYLDDXKKX HNLDITXVXQ VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 

151 PXXFXRVLVM LDELGXIKLK DX1XXXXXXX XXXXXXXXXX XXXXXXXXXX 

201 XXXXAXXXXX XXXXXXXXXS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 



A leader peptide is underlined. 
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Further analysis of these strain A sequences revealed the complete DNA sequence <SEQ ID 221 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA AG AAGT CAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGCTGAC CGCATCCAAA GCGGACATTG 

551 CCGAAAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT . GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 222; ORF4a-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA VVNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 

10 20 30 40 50 60 

orf 4a-l MKT FFKTLSAAALAL I LAACGGQKDSAPAAS AS AAADNGAAKKE I VFGTT VGDFGDMVKE 

I I M I M I I i II i i M I ! I I I I! M I M i I I M M M I II I M I M I M I I I M t II M 1 
orf 4-1 MKT FFKTLSAAALALILAACGGQKDSAPAASAS AAADNGAAKKE I VFGTTVGDFGDMVKE 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf4a-l QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
Ml I I I I 1 II I I I I M II I I II I I I I I II I I II I M i I II I I I I M I II I I M I M I I I 
orf 4-1 Q I QAE LEKKG YTVKLVE FT D Y VRPNLALAEGE LD IN VFQHKP Y LD D FKKEHN L D I TE V FQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf4a-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
( If f i I I I I I M M I I I M I I I M II I I I I M M M M I II I M II I II I I M 1 I M M I 

orf 4-1 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf4a-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
I I I I I I II I I I I I II M I II i I i I I I M I I M M M 1 I I if II I M M II I I I M i M I I 
orf 4-1 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

190 200 210 220 230 240 

250 260 270 280 

orf 4 a- 1 A VKT A DK D S Q W LK D VTE A YN S DA FKA Y AHKR FE G YKS P AAWN E G AAKX 

I If I M I I I I I M If I 1 I II II I f i I I i I I I I f M I I I II I I I I I f I f 
orf 4-1 AVKT ADK DSQWLKDVT E AYN S D AFKAY AHKR FE G YK S P AAWN EG AAKX 

250 260 270 280 



Homology with an outer membrane protein of Pasteurella haemolitica (accession q08869). 



ORF4 and this outer membrane protein show 33% aa identity in 91 aa overlap: 
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10 20 

Ho? n^h* MN FKKLLGVALV S AL ALT ACKDE KAQAP 

lip2 " PaSha II 1 ::H II I: I I :|: I 

0RF4 VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL— ALILAACGFKKTARPPHPL 

5 110 120 130 140 150 

30 40 50 60 70 80 

lio2 pasha -ATTAKTEKKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKD 
y ' y : :: j : |: :| ::|:: :: III I : I I : I I : I : : I I I I : 

1 0 0RF4 LPPPTTARRKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGE 

160 170 180 190 200 210 



15 



90 100 110 120 130 140 

lip2 . pasha LDANAFQTVPYLEQEVKDRGYKLAIIGNTLVWPIAAYSKKIKNISELKDGATVAIPNNAS 



0RF4 L 

Homology with a predicted ORF from N. gonorrhoeae 

ORF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (ORF4.ng) from N. 
20 gonorrhoeae: 

10 20 30 

orf 4nm pep MKT F FKT L S AAAL AL I L AAC GXQ KD S APAA 

1 I I II I I I I : ! : I I I I I 11 I I M I 1 I 1 I I 
orf 4ng RANAVXTPNPDGRTPCLSFLFETATTSGENMKTFFKTLSTASLALILAACGGQKDSAPAA 
25 200 210 220 230 240 250 

40 50 60 70 80 89 

orf 4nm.pep SASA-AADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVE FTDYVRPNLALA 
M : | : I II I II I II II I I I I I M M I I I I I I I I t I I M I M I I M I II I! I I ! 11 ! I ! I 
30 orf 4ng SAAAPSADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVE FTDYVRPNLALA 

260 270 280 290 300 310 

90 

orf4nm.pep EGEL 
35 I I I I 

orf4ng EGELDINVFQHKPYLDDFKKEHNLDITEAFQVFTAPLGLYPGKLKSLEEVKDGSTVSAPN 
320 330 340 350 360 370 

The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode a 
protein having amino acid sequence <SEQ ID 224>: 

40 1 MKTFFKTLST AS LAL ILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AVVNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

45 251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc gCgGCGAAAA AAGAAAtcgt ctTCGGCACG 

50 151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT TGGACATCAA CGTCTTCCAA 

301 CACAAACCCT ATCTTGACGA TTTCAAAAAA G AAC AC AAC C TGGACATCAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

55 401 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

451 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 

601 CTGCCGCGCA GCCGCGCCGA CGTGGATTTT GCCGTCGTCA ACGGCAACTA 

60 651 CGCCATAAGC AGCGGCATGA AGCTGACCGA AGCCCTGTTC CAAGAGCCGA 
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701 GCTTTGCCTA TGTCAACTGG TCTGCCgtcA AAACCGCCGA CAAAGACAGC 

751 CAATGGCTTA AAGACGTAAC CGAGGCCTAT AACTCCGACG CGTTCAAAGC 

801 CTACGCGCAC AAACGCTTCG AGGGCTACAA ATACCCTGCC GCATGGAATG 

851 AAGGCGCAGC CAAATAA 

5 This encodes a protein having amino acid sequence <SEQ ID 226; ORF4ng-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

10 201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

This shows 97.6% identity in 288 aa overlap with ORF4-1: 

10 20 30 40 50 59 

orf 4-1 . pep MKTFFKTLSAAALALILAACGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
15 11 I 1 I 1 1 I I 1 I 1 11 M II I I I II M II i U I I : I : I I I I I I I I I I I I I I I I I I I t I I I I 

orf4ng-l MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

20 orf 4-1 . pep EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVF 

I I 1 I i I I I I I I I I I M I I I M I I I I I i I I I I I I I I I I I I M M I I I I I I M I I M II I : I 
orf 4ng-l EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

70 80 90 100 110 120 

25 120 130 140 150 160 170 179 

orf 4-1 .pep QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 
( I If I I I I I I I I I I I I I M I I M i II I I I II I I I I I I : I M I : II I i I II I I i I I I II I I 
orf4ng-l QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

130 140 150 160 170 180 

30 

180 190 200 210 220 230 239 

orf 4-1 . pep KAD I AEN LKN I K I VE LE AAQL PR SRAD V D FAWNGN Y A I S S GMKLT E AL FQE P S FA Y VN W 

M I I I I I I I I I I I II I I I I I I M I I I I I I I I I I I I M I I I M I I M M II I I II I M I I I 
orf 4ng-l KADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNW 
35 190 200 210 220 230 240 

240 250 260 270 280 

orf 4-1 . pep S AVKT ADKD S QWLKD VTE AYN S D AFKAY AHKRFEG YKS P AAWNE G AAKX 

I I I I M I II I M I II I M I I M I I I I I I M I I I I I II II I I I I II I M 
40 orf4ng-l SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

250 260 270 280 



In addition, ORF4ng-l shows significant homology with an outer membrane protein from the 
database: 

45 ID LIP2_PASHA STANDARD; PRT; 276 AA. 

AC Q08869; 

DT 01-NOV-1995 {REL. 32, CREATED) 
DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 
DT 01-NOV-1995 {REL. 32, LAST ANNOTATION UPDATE) 
50 DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 

SCORES Initl: 27 9 Initn: 416 Opt: 4 94 

Smith-Waterman score: 4 94; 3 6.0% identity in 27 5 aa overlap 

10 20 30 40 50 

55 orf 4ng-l . pep MKTFFKTLSAAAL — ALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDM 

I I I : : I I I I I : II : I : I II : : I : : : I I I I : : I : : I 

lip2_pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

10 20 30 40 50 

60 60 70 80 90 100 110 

orf 4ng-l . pep VKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 
: : : : III I : I I : I I : I : : I I I I : I I I : I I III:: I : : : : : 
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lin2 iDasha TEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKDLDANAFQTVPYLEQEVKDRGYKLAI 
P - P 60 70 80 90 100 110 

120 130 140 150 160 170 

S orf4na-l nep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

D ° r 9 P P | | :|:: |:|||:||: ||: It I : II I I I : 

] in2 oasha igntlvwpiaayskkikniselkdgatvaipnnasntarallllqahgllklkdpkn-vf 

P - 120 130 140 150 160 170 

10 180 190 200 210 220 230 

orf4ng-l pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTE ALFQEPSFA 
I : : I I I I I I I I II : : : : I I | | : : I I : I : : I I : : I : : : : : : 

lip2 pasha atendiienpknikivqadtslltrmlddvelavinntyagqaglspdkdgiiveskdsp 

180 190 200 210 220 230 

^ 240 250 260 270 280 289 

orf 4nq-l pep yVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 
| | | : : : I I : I : ::::::: I ! I : I 

lip2 pasha yvnlvvsrednkddprlqtfvksfqteevfqealklfnggwkgw 

20 ~ 240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteur ella 
3 haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 

% site in the gonococcal protein, it was predicted that these proteins from N. meningitidis and 

Hf. 25 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 

raising antibodies. 

L ORF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 

F The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8A and 

O 8B show, repsectively, the results of affinity purification of the His-fiision and GST-fusion 

[% 30 proteins. Purified His-fiision protein was used to immunise mice, whose sera were used for ELISA 
(positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that ORF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-1. 
35 Example 27 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 227>: 

1 cctcgtcgtc ctcggcatgc tccagtttca aggggcgatt tactccaagg 

51 cggtggaacg tatgctcggc acggtcatcg ggctgggcgc gggtttgggc 

101 gttttatggc tgaaccagca ttatttccac ggcaacctcc tcttctacct 

40 151 caccgtcggc acggcaagcg cactggccgg ctgggcggcg gtcggcaaaa 

201 acggctacgt ccctmtgctg gcagggctga cgatgtgtat gctcatcggc 

251 gacaacggca gcgaatggct cgacagcgga ctcatgcgcg ccatgaacgt 

301 cctcatcggc gyggccatcg ccatcgccgc cgccaaactg ctgccgctga 

351 aatccacact gatgtggcgt ttcatgcttg ccgacaacct ggccgactgc 

45 4 01 agcaaaatga ttgccgaaat cagcaacggc aggcgcatga cccgcgaacg 

4 51 cctcgaggag aacatggcga aaatgcgcca aatcaacgca cgcatggtca 

501 aaagccgcag ccatctcgcc gccacatcgg gcgaaagctg catcagcccc 
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551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 

651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC 

701 GC AGACACGCCC GCCGCATCCG 

5 751 CATCGACACC GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 

801 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 

851 ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 

901 TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

This corresponds to the amino acid sequence <SEQ ID 228; ORF8>: 

10 1 PRRP RHAPVSRGDL LQGGGTYARH GHRAGRGFGR FMAEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRTPRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

15 251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

301 PPQMAGCPRT PTPAPKPA* 

Computer analysis of this amino acid sequence gave the following results: 
Sequence motifs 

ORF8 is proline-rich and has a distribution of proline residues consistent with a surface 
20 localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 

Homology with a predicted ORF from N .gonorrhoeae 

ORF8 shows 86.5% identity over a 312aa overlap with a predicted ORF (ORF8.ng) from N. 
gonorrhoeae: 



25 



30 I I I I I I I It I I i M I ( I I I I M i I I I I I I I : I I I I I ( I I I i I t I 



35 



40 



45 



orf 8ng 


1 


orf 8 . pep 


1 


orf 8ng 


51 


orf 8 . pep 


45 


orf 8ng 


101 


orf 8 .pep 


95 


orf 8ng 


151 


orf 8 . pep 


145 


orf 8ng 


201 


orf 8 . pep 


195 


orf 8ng 


251 


orf 8 .pep 


245 


orf 8ng 


301 


orf 8 . pep 


295 



I I ! 1 i M I 1 111) I I I I I 1 : I I I I I I I 1 I 1 I I I II I I I M 
. PRRPRHAPVSRGDLLQGGGTYARHGHRAGRGFGRFMAEPALFPR 4 4 



I I MINI III I I I M I I I I II I I I I I M I M I I I II I I I I I I I I 



: I : I II M I I I I II I I I NIMH III I I I I I I 1 I I 



( I I I I M I M II I I I I M II M I I I I 1 I II I II I I 



I II I I I II I I I II I I I II I I I I I I I I I II I I . I I I I I M II II II 



50 I I M I I I I I II I I I I I I I I 

orf 8. pep 295 P PQMAG C PRT PT P A PK P A * 313 

The complete length ORF8ng nucleotide sequence <SEQ ID 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 230>: 
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1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AYDARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 

5 201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 

301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from N. meningitidis 
and K gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
10 raising antibodies. 



Example 28 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 231^ 

1 . . GAAATCAGCC TGCGGTCCGA CNACAGGCCG GTTTCCGTGN CGAAGCGGCG 

51 GGATTCGGAA CGTTTTCTGC TGTTGGACGG CGGCAACAGC CGGCTCAAGT 

15 101 GGGCGTGGGT GGAAAACGGC ACGTTCGCAA CCGTCGGTAG CGCGCCGTAC 

151 CGCGATTTGT CGCCTTTGGG CGCGGAGTGG GCGGAAAAGG CGGATGGAAA 

201 TGTCCGCATC GTCGGTTGCG CTGTGTGCGG AGAATTCAAA AAGGCACAAG 

251 TGCAGGAACA GCTCGCCCGA AAAATCGAGT GGCTGCCGTC TTCCGCACAG 

301 GCTTT. GGCA TACGCAACCA CTACCGCCAC CCCGAAGAAC ACGGTTCCGA 

20 351 CCGCTGGTTC AACGCCTTGG GCAGCCGCCG CTTCAGCCGC AACGCCTGCG 

4 01 TCGTCGTCAG TTGCGGCACG GCGGTAACGG TTGACGCGCT CACCGATGAC 

451 GGACATTATC TCGGAGA.GG AACCATCATG CCCGGTTTCC ACCTGATGAA 

501 AGAATCGCTC GCCGTCCGAA CCGCCAACCT CAACCGGCAC GCCGGTAAGC 

551 GTTATCCTTT CCCGACCGG. . 

25 This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 . .EISLRSDXRP VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 

51 RDLSPLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 

101 AXGIRNHYRH PEEHGSDRWF NALGSRRFSR NACVVVSCGT AVTVDALTDD 

151 GHYLGXGTIM PGFHLMKESL AVRTANLNRH AGKRYPFPT . . 

30 Further work revealed the complete nucleotide sequence <SEQ ID 233>: 

1 ATGACGGTTT TGAAGCTTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

35 201 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

4 01 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 

40 4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGTC GGCGCGCCTT 

501 GTCGCGTTTA GGTTTGGATG TGCAGATTAA GTGGCCCAAT GATTTGGTTG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTTG TCCTGCCCAA 

651 GGAAGTAGAA AATGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

45 701 GGCGGGGCAA TGCCGATGCC GCCGTGCTGC TGGAAACGCT GTTGGTGGAA 

7 51 CTGGACGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTT TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

50 951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCTGTGT GCGGAGAATT CAAAAAGGCA 

55 1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGfGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 
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1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

5 1551 GGATGCGGTT TGCGGCTCGG T TAT GAT GAT GCACGGGCGT TTGAAAGAAA 

1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

17 01 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 

1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

10 This corresponds to the amino acid sequence <SEQ ID 234; ORF61-l>; 

1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG GILIETVRTG 

15 201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

20 4 51 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-L Further 
computer analysis of this amino acid sequence gave the following results: 

25 Homology with the baf protein of B. pertussis (accession number U12020). 
ORF61 and baf protein show 33% aa identity in 166aa overlap: 



30 



35 



orf61 


23 


baf 


3 


orf61 


78 


baf 


63 


orf 61 


132 


baf 


123 



LLL DGGNS RLKWAWVE -NGT FAT VG S AP YR DLS PLGAEWAEKADGNVRI VGCAVCG 7 7 

+L+D GNSRLK W + + A AP DL LG A R +G V G 

ILIDSGNSRLKVGWFDPDAPQAARE PAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 

EFKKAQVQEQLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 131 

+ + L I WL + A G+RN YR+P++ G+DRVJ L + 

L ARG E A I AAT L RAGGC D I RW LRAQ P L AMG LRNG YRN P DQLG A DRW ACM VG VL ARQ P S VH P 122 



+V S GTA T+D + D + G G I+PG +M+ +LA TA+L 



Homology with a predicted ORF from N. meningitidis (strain A) 
40 ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of AT. 
meningitidis: 

10 20 30 

orf 61 .pep EISLRSDXRPVSVXKRRDSERFLLLDGGNS 

I M I t II I I II I I I t I I I I I I I I I I I I I 
45 orf 61a TVFEGTVKGVDGQGVLHLETAEGKQTVVSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 

290 300 310 320 330 340 

40 50 60 70 80 90 

orf 61 . pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 
50 | | | | | M | | | M I I I I M II I I I M I M II I I I : I I I I I M I M I II I M I M I II I M I 

orf 61a RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLAR 
350 360 370 380 390 400 

100 110 120 130 140 150 

55 orf 61 .pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACWVSCGTAVTVDALT DD 

M I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I II | | | | | | | | | M I I I I I I I M I 
orf 61a KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACWVSCGTAVTVDALT DD 
410 420 430 440 450 460 
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160 170 180 189 

orf 61 . pep GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 
I I I I I II M I I I M I I I I I I N 1 I I I I I M II I I i I t I 
5 orf 61a GHYLG-GTIMPGFHLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMM 

470 480 490 500 510 520 

orf 61a HGRLKE KT G AGK P V D V 1 1 T GGG AAKV AE AL P P AFL AE NT VR V ADN LVIHGLLNLI AAEG G 

530 540 550 ' 560 570 580 

10 The complete length ORF61a nucleotide sequence <SEQ ID 235> is: 

1 ATGACGGTTT TGAAGCCTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

15 201 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGTG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 

20 4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGCC GGCGCGCCTT 

501 GTCGCGTTTG GGTTTGAAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

25 701 GGCGGGGAAA TGCCGATGCC GCCGTGTTGC TGGAAACGCT GTTGGCGGAA 

7 51 CTTGATGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTC TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

30 951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGTGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATT CAAAAAGGCA 

35 1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

40 14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

1551 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT TTGAAAGAAA 

1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

45 1701 GCGCGTGGCG GACAACCTCG TCATTCACGG GCTGCTGAAC CTGATTGCCG 

1751 CCGAAGGCGG GGAATCGGAA CATACTTAA 

This encodes a protein having amino acid sequence <SEQ ID 236>: 

1 MTVLKPSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

50 101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLKTQIKWPN DLVVGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

55 351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KVDGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVVVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HT* 

60 ORF61a and ORF61-1 show 98.5% identity in 591 aa overlap; 

10 20 30 40 50 60 

or f 61a . pep MTVLKPSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 
Mill I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M M I I I M I 
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10 



15 



20 



25 



30 



35 



U 40 



45 



50 



55 



orf61-l MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf61a pep LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 
II I ) I M 1 1 1 M M 1 I I I U U 11 t f 1 1 1 ft I ( i 1 I I f I ( f IttllMIMIMilUM 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf61a pep GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLKTQIKWPN 
1 | M I I 1 M I I I I i I I i I M II I I II I I t I II I t M i M I I I I I M I I I I I I : t I I I I I 
orf61-l GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf61a pep DLVVGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 
I M I I U i I M I M II I I I I I I I I I I I I I I I I I I I I I I I I M I I II II I I I I I I I I I I I I 
o r f 6 1 - 1 DL WGRDKLGGI LI ET VRTGGKTVAWG I GIN FVLPKEVENAASVQSLFQTASRRGNADA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 61a . pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
I | | | M I I : I II I M I- II I i I II I I M I I I I I I I I I I I II II II II M I I I I I I I I I M I 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 61a . pep QGVLHLETAEGKQTVVSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 
I I I I I I I I M I II I I I I II I I I I II I II 1 M I I M M I I I I I I I I I I II I II I I I I I I I I 
orf 61-1 QGVLHLETAEGKQT WSGEI SLRS DDRPVSVPKRRDSERFLLLDGGN SRLKWAWVENGT F 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 61a . pep ATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
I | | | | I M I I I I II I I I I I I I : I I I I I M I I I I I I I I I I I I I I I I II M I M I I I I I I I I 
orf 61-1 ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 61a. pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
I I I I I I I I M I I M I I I M I II II I II M I I I I II I I II II II I I I I II I I I I II I M I I 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 

430 440 450 460 470 480 

490 500 510 520 530 540 

or f 61a . pep HLMKE S LAVRT AN LNRHAGKRY P F PTT TGNAVASGMMDAVCG S VMMMHGRLKEKTGAGKP 
I I I I I I I M II I II I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I M M I I M I II I I 
o r f 6 1 - 1 HLMKE S LAVRTANLNRHAGKRYP FPTTTGNAVASGMMDAVCGS VMMMHGRLKEKTGAGKP 

490 500 510 520 530 540 

550 560 570 580 590 

orf 61a. pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

I I I M II I II I I I I I I I I I M I I I I 1 I I I I I I I M : I M I : I I I I I I II 
orf 61-1 VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 

550 560 570 580 590 



60 



Homology with a predicted ORP from N. gonorrhoeae 

ORF61 shows 94.2% identity over a 189aa overlap with a predicted ORF (ORF61.ng) from K 
gonorrhoeae: 



orf 61 . pep EISLRSDXRPVSVXKRRDSERFLLLDGGNS 30 

I I I I I I I III II I I I I I I I I : I I I I 
orf 61ng TVCEGTVKGVDGRGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 
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orf 61 . pep 
orf 61ng 
orf 61 .pep 
orf 61ng 
orf 61 .pep 
orf 61ng 



RLKWAWVENGTFATVGSAPYRDLS PLGAEWAEKADGNVRIVGCAVCGE FKKAQVQEQLAR 90 
I M M I I I I I I M I I U II II I I M M I M I I I M I I II M I! I M I I I M II : M I M 

RLKWAWVENGT FATVG S APYRDL S PLGAE WAEKADGNVR I VGCAVCGE SKKAQVKEQLAR 271 

KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 
I I 1 1 1 M I I I I I ! I M I I I I I M M M I M I 1 I I I I I I I I I M I ! I M II I II I I 1 II 

KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVVVSCGTAVTVDALTDD 331 

GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 18 9 
I I 1 I I I I I I I II I I I M I M I I M I I I I MINIMI 

GHYLG-GTIMPGFHLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 390 



An ORF61ng nucleotide sequence <SEQ ED 237> was predicted to encode a protein having amino 
acid sequence <SEQ ID 238>: 



1 MFSFGWAFDR 

51 KLGGILIETV 

101 ADAAVLLETL 

151 ETVCEGTVKG 

201 ERFLLLEGGN 

251 I VGCAVCGE S 

301 FNALGSRRFS 

351 AVRTANLNRP 

4 01 GKPVDVIITG 

451 ESEHA* 



PQYEL GSLSP VAALAC RRAL 



RAGGKTVAVV 
LAELGAVLEQ 
VDGRGVLHLE 
SRLKWAWVEN 
KKAQVKEQLA 
RNACVVVSCG 
AGKRYPFPTT 
GGAAKVAEAL 



GIGINFVLPK 
YAEEGFAPFL 
TAEGEQTWS 
GTFATVGSAP 
RKIEWLPSSA 
TAVTVDALTD 
TGNAVASGMM 
PPAFLAENTV 



GCLGLETQIK 
EVENAASVQS 
NEYETANRDH 
GEISLRPDNR 
YRDLS PLGAE 
QALGIRNHYR 
DGHYLGGTIM 
DAVCGSIMMM 
RVADNLVIHG 



WPNDLWGRD 
LFQTASRRGN 
GKAVLLLRDG 
SVSVPKRPDS 
WAEKADGNVR 
HPEEHGSDRW 
PGFHLMKESL 
HGRLKEKNGA 
LLNLIAAEGG 



Further analysis revealed the complete gonococcal DNA sequence <SEQ ID 23 9> to be: 



1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

501 GGGGTGTTTG GGTTTGGAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 

601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 

7 51 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

801 AAATGAGTAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 

951 cggcgaaaTC AGcctGCggc ccgacaacaG GTCGGtttcc gtgccgaagc 

1001 ggccggatTC GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 

1051 AAGTGGGCGT GggtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 

1101 gtaCCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 

1201 CAAGTGAAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGTTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

1501 CGTTACCCTT TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 

1551 GGACGCGGTT TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 

1601 AAAACGGCGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCGAAAGTCG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

17 01 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

1751 CCGAAGGCGG GGAATCGGAA CACGCTTAA 

This corresponds to the amino acid sequence <SEQ ID 240; ORF61ng-l>: 



1 MTVLKPSHWR VLAELADGLP QHVSQLAREA DMKPQQLNGF WQQMPAHIRG 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



LLRQHDGYWR 
ARIAPDKAHK 
ELGSLSPVAA 
GKTVAWGIG 
LGAVLEQYAE 
RGVLHLETAE 
KWAWVENGTF 
QVKEQLARKI 
CWVSCGTAV 



LVRPLAVFDA 
TICVTHLQSK 
LACRRALGCL 
INFVLPKEVE 
EGFAPFLNEY 
GEQTVVSGEI 
ATVGSAPYRD 
EWLPSSAQAL 
TVDALTDDGH 



RYPFPTTTGN 
AKVAEALPPA 



AVASGMMDAV 
FLAENTVRVA 



EGLRDLGERS 
GRGRQGRKWS 
GLETQIKWPN 
NAASVQSLFQ 
ETANRDHGKA 
SLRPDNRSVS 
LSPLGAEWAE 
GIRNHYRHPE 
YLGGTIMPGF 
CGSIMMMHGR 
DNLVIHGLLN 



GFQTALKHEC 
HRLGECLWFS 
DLWGRDKLG 
TASRRGNADA 
VLLLRDGETV 
VPKRPDSERF 
KADGNVRIVG 
EHGSDRWFNA 
HLMKESLAVR 
LKEKNGAGKP 
LIAAEGGESE 



ASSNDEILEL 
FGWAFDRPQY 
GILIETVRAG 
AVLLETLLAE 
CEGTVKGVDG 
LLLEGGNSRL 
CAVCGESKKA 
LGSRRFSRNA 
TANLNRPAGK 
VDVI ITGGGA 
HA* 



ORF61ng-l and ORF61-1 show 93.9% identity in 591 aa overlap: 



orf 61ng-l .pep 



orf 61-1 



orf 61ng-l .pep 
orf 61-1 



MTVLKPSHWRVLAELADGLPQHVSQLAREADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

Mill I II I M I I I I I I I II II I I I t I I I II 11 I M M M I II I I I I I I I I I I M I I 1 

MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

LVRPLAVFDAEGLRDLGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 
I I II M II I I I I M : I ! II M II II II I I II I M I 1 I I I I I I I I M I I I I II I I I I M II 

LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 



orf 61ng-l .pep GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 180 

I M I I II I M M M I I M I I I I I : I I I I M I I I I I I I II I : M I I I I : II I :: I I I II I 
orf 61-1 GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 180 

orf 61ng-l . pep DL VVGRDKLGGI L I E T VRAGGKTVAWG I G INFVLPKE VENAAS VQSL FQTASRRGNADA 24 0 
M I I II II M M I I I II I : I I M I I I I I I I I I I I I M II II I I M M I I I I M I M M M 
DLWGRDKLGGILIETVRTGGKTVAWGIG INFVLPKE VENAAS VQSLFQTASRRGNADA 24 0 



orf 61-1 
orf 61ng-l .pep 
orf61-l 
orf 61ng-l .pep 
orf61-l 
orf 61ng-l .pep 
orf 61-1 
orf 61ng-l . pep 
orf 61-1 
orf 61ng-l . pep 
orf 61-1 
orf 61ng-l . pep 
orf61-l 



AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 
I I I M I M : I I Ml I M : : I I M I : II : : I I I M 11 II M M I I I II M I I II I M 
AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 

RGVLHLETAEGEQTVVSGEISLRPDNRSVSVPKRPDSERFLLLEGGNSRLKWAWVENGTF 3 60 

: I M I I M I M : I II II M I M I I : I I II I I I I I I I I I M : I M M M I I I M I M I 

QG VLHLETAEGKQT WSGE I S LRS DDRP VS VPKRRDSERFLLLDGGNSRLKWAWVENGT F 360 

ATVGSAPYRDLS PLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLARKIEWLPS SAQAL 4 20 
M I I M II I II I If I M I I I I I I I II I I I I I I I I I I M M I : I I I II II I I I I I I M II 
ATVGSAPYRDLS PLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPS SAQAL 4 20 

GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 4 80 
M I I I I I M II I II I II M I I II I I M I M I I II I I M I I M I I I I I I I I I I I I I I I I I I 
GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVVVSCGTAVTVDALTDDGHYLGGTIMPGF 4 80 

HLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMMHGRLKEKNGAGKP 54 0 
I I I I II I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M II I I : I I I I II I I M : I I I I I 
HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 54 0 

VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 
I II I I I I M I I I I I I I II II I I I I M I I I I M I I I : I M I : II I II I I I 
VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 593 



Based on this analysis, including the homology with the baf protein of B.pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
55 proteins from N .meningitidis and N. gonorrhoeae^ and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 29 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 241>: 
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1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

4 01 CGG a AG AGGG CGGCGaAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGC . . 

This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 

1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 

201 WSVGMVLSLL YLGLGC . . 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

4 01 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ATGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 

751 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 

851 AATAA 

This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 

1 MFYQILALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGGA EEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSALGVFVV IAATLVAG RL SHQK* 



Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical transmembrane protein HI0976 of H. influenzae (accession number Q57147) 
ORF62 and HI0976 show 50% aa identity in 1 14aa overlap: 

Orf62 1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS IKY +DP L+V VR R KI + K 

HI097 6 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLVVQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Orf62 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
HI0976 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of N. 



meningitidis: 



10 



15 



20 



25 



orf 62 . pep 
orf 62a 

orf 62 . pep 
orf 62a 

orf 62 .pep 
orf 62a 

orf 62 .pep 
orf 62a 

orf 62a 



10 20 30 40 50 60 

M F Y Q I LAL 1 1 W S S S F I A AK Y V YG G I D P ALMVG VRLL I AAL PAL P AC RRHV GK I PREE WKP 
I I I I i I ( M I i I I I II t I I II I I I I I I I I I I M t I I I I I I I M I II II I M I I I ! I 1! I I 
MFYQILALIIWSSSFIA AKYVYGGI D PALMVGVRLLIAALPAL PACRRHVGKI PREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

L LIVSFVNYVLTLLLQFV GLKYTSA ASASVIVGLEPLLMVFV GHFFFNDKARAYHWICGA 
I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I M I I I I M I I I I 1 
L LIVS FVN YVLT LLLQFV GLK YT SA ASASVIVGLEPLLMVFV GHFFFN DKARA YHW ICGA 

70 80 90 100 110 120 

130 140 150 160 170 180 

AA FAG VA L LMAGG A E E GGE VGW FGC L L VLLAG AG FCAAM R PT QR L I AR I G A PA FT S V S I A 

I I I M I M M I I I I I I I f I I I I I M II I 1 I I I t II I I I ! I i I M I M I M I I I I I I I I M 
AA FAG VALLMAG G AEE G GE VGW FGC L L VLLAGAG FCAAM R PT QR L I AR I G A P AFT S V S I A 

130 140 150 160 170 180 

190 200 210 

AASLMCLPFSLAL AQSYTVDW S VGMVL SLL YLGLGC 

II I I II I I I I II I M I II I I I M I I I I I I I I I I : I I 

AASLMCLPFSLAL AQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSG LLI 
190 200 210 220 230 240 

SLEPWGVLLAVLI LGEHLSPVSVLGVFWIAATLVAGRLSHQKX 
250 260 270 280 



30 The complete length ORF62a nucleotide sequence <SEQ ID 245> is: 



35 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCTGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GAATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TACTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCACT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGCG 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCAG 
GTTCCTGCCA 
CGTGCTGCTG 
TCTTGGGCGT 
TCGCATCAAA 



This encodes a protein having amino acid sequence <SEQ ID 246>: 



50 



55 



l 

51 
101 
151 
201 
251 



MFYQILALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL P AC RRHV 

GKI PREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLFFS LAL AQSYTVD 

WSVGMVLSLL YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 
AVLI LGEHLS P VSVLGVFW IAATLVAG RL SHQK* 



ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 



60 



orf 62a . pep MFYQILALIIWSSSFIAAKYVYGGI DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 

M I I I I II I M I II M I I II I I I I M I I I I I II I I I I I I I I I M I M I M I I I I I I I I I I 
orf 62-1 MFYQILALIIWSSSFIAAKYVYGGI DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 
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orf 62a . pep LLIVSFVNYVLTLLLQEVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

I I I I I I I I M M (! M I I I I I I M I I 1 M I II II M I II I I I I M II I 1 I M I I I I M I I 
orf 62-1 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

orf 62a . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

I M II I I I II I I I I I I I I I I I I I I I I I M M I I I I II II I II I I I I I I M I II I I M I M 
orf 62-1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf 62a . pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 240 

I I I I M I I I I II I I I I M I I I I I I I I I M I I I I : I I : I M I I II I I I II I I I I I I I I I I I 
orf 62-1 AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 24 0 

orf 62a. pep SLEPWGVLLAVLILGEHLSPVSVLGVFWIAATLVAGRLSHQKX 285 

I I I I M M I I I I I I M I M I M I : I II I M I II I I I I I II M II I 
orf 62-1 SLEPVVGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 285 



Homology with a predicted ORF from N.gonorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62,ng) from N. 
gonorrhoeae: 



orf 62 . pep 


M F YQ I L AL IIWSSSFI AAK YV YGG I D P ALMVGVRLL I AAL PAL P ACRRHVGK I PREEWKP 


60 


orf 62ng 


1 1 1 1 1 1 1 1 1 1 1 : 1 1 II 1 M II 1 1 1 1 ! I 1 1 1 1 1 M 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 
MFYQI LALI I WGS S FI AAKYVYGGI DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 


60 


orf 62 . pep 


LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 


120 


orf 62ng 


1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 I 1 1 1 1 1 1 1 1 II 1 1 I 1 11 1 1 1 1 1 1 1 1 1 1 I 1 1 M 1 1 1 1 1 1 1 1 1 
LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 


120 


orf 62 .pep 


AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 
I I I II 1 1 II 1 1 1 1 1 1 M 1 1 1 1 I 1 1 M 1 1 1 1 1 M 1 II 1 II 1 I 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 II 
AA FAG VAL LMAGG AE E GGE VG W FGC L L VL L AG AG FC AAMR P T QRL I AR IGAPAFTSVSIA 


180 


orf 62ng 


180 


orf 62 . pep 


AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 


216 


orf 62ng 


1 1 1 1 1 1 1 1 1 1 I 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 


240 



The complete length ORF62ng nucleotide sequence <SEQ ID 247> is: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGGGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTGAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 CCGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGTTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ACGCGTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGTTG 

751 GCGGTTTTGA TTTTGGGCGA ACATTTATCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CTTTCGCCGC CGGCCGGCTG TCGCGCAGGG 

851 ACGCGCAAAA CGGCAATGCC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 248>: 



1 MFYQILALII WGSSFIAAKY V YGG ID PALM VGVRLLIAAL PAL PACRRHV 

51 GKI PREEWKP L LIVSFVNYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANASG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSALGVFW I AAT FAAG RL SRRDAQNGNA V* 
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ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 

10 20 30 40 50 60 

orf 62ng pep MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 
U M I I I I M I : 1 I I M 11 M I I I I It I I I II I I I I I I I M I i I I M I I M I II I I I I I I 
5 orf 62-1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 62ng.pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 
10 M I I I I M I I 1 I M I I M I II I I I I II I I I I I M M I I I I II I I I I I I M I I II I I I I II 

orf 62-1 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

70 80 90 100 110 120 

130 140 150 160 170 180 

15 orf62ng.pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

I I I I I I I I II I I M II I I I I M M I I I I I I I I I II II I M II I II M I II I I I I II I I I I 
orf 62-1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

130 140 150 160 170 180 

20 190 200 210 220 230 240 

orf 62ng . pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 
I ! I I I I i I I I ! I I I { I I I I ! I M M II I II I !!! I I I I I I I I I ! I I I M I I I I !: I I I I I 
orf 62-1 AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 

25 

250 260 270 280 290 

orf 62ng.pep SLEPVVGVLLAVLILGEHLSPVSALGVFVVIAATFAAGRLSRRDAQNGNAVX 

I M I I i II 1 I I M I M M M I II II II II I I I M : : I I I I I : : 
orf 62-1 SLEPVVGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 
30 250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical H.influenzae protein: 

sp|Q57147 I Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi I 1074589 [pir MB64163 
hypothetical protein HI097 6 - Haemophilus influenzae {strain Rd KW20) 
>gill574004 (U32778) hypothetical [Haemophilus influenzae] Length = 128 
35 Score = 106 bits (262), Expect = 2e-22 

Identities = 56/114 (49%), Positives - 68/114 (59%) 

Query: 1 MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 
M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

40 Sbjct: 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 



45 



Query: 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF4-GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Based on this analysis, including the homology with the transmembrane protein of H.influenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from K meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



50 Example 30 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 249>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 sGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

" 151 TTGGCACGTT AT GT CAT ATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCtA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 
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251 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 

301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

351 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

4 01 CTCGGCAACG CCGTCCCCGT GC AG AT AG AC CTCATCGGCG CGGCTTCCCT 

4 51 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 

501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAATCGA AAAAAGCATC 

551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGGGAaAa 

601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GGAAAGCATA GGCGGCGTAT 

651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 

701 GCCTTGTTTT TCCGTCAGCC GGTTCCCAAA GGCGTGGCAG AGGATGCCGT 

7 51 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 

801 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 

851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 

901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 

951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT ACC . . 

This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 

1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVXSAV 

51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 

101 GTINSWFGND THEALERSLN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG RDYALFFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 

301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 

351 DERNRRREEA ARHYLECVLE GLTTGVWFD EQGCLKTFNK AAGT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 25 1>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT AT GT CAT ATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

2 01 CGGTTCGCAG ATTGCCAAAC GCCTTTCTGG GATGTTTACG CTGGTTGCCG 

2 51 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT CATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATTT GGCGGCAGAC AACGCCCTCG 

4 01 GCAACGCCGT CCCCGTGCAG ATAGACCTCA TCGGCGCGGC TTCCCTGCCC 

4 51 GGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACGGGCGG GTTCGGTCAG GGATTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGC TGGCTGTCGG CGGGTACGCA CAACGGGCGC GATTACGCCT 

7 01 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGAA ATATGCTGAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTGG CAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AG AAG C AG AC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGGCATTATC TTGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACGGCAACG GCGTGGTAAT 

14 01 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

14 51 GGGGCGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 G GAT GAG C AG GATGCGCAAA TCCTGACGCG TTCGACCGAC ACCATCGTCA 

1601 AACAGGTGGC GGCATTGAAG GAAATGGTCG AAGCATTCCG CAATTATGCG 

1651 CGTTCCCCTT CGCTCAAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

17 01 CGATGTGTTG GCATTGTATG AAGCCGGTCC GTGCCGGTTT GCGGCGGAGC 

17 51 TTGCCGGCGA ACCGCTGACG GTGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 
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18 51 TGTGCCCGAA GTCAGGGTAA AATCGGAAAC AGGGCAGGAC GGTCGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAAGGGT TCGGCAGGGA AATGCTGCAC 

1951 AACGCCTTCG AGCCGTATGT AACG G AC AAA CCGGCGGGAA CGGGATTGGG 

2001 TCTGCCTGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGCGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAAAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 LARYVILLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALNLAAD NALGNAVPVQ IDLIGAASLP 

151 GDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QRAGSVRDLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLAT LLIAS LLSIFLALVM AL YFARREVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKTFNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NGNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEQ DAQILTRSTD TIVKQVAALK EMVEAFRNYA 

551 RSPSLKLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLT VAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSETGQD GRIVLTVCDN GKGFGREMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKIIEEHGG RISLSNQDAG GACVRIILPK 

701 TVKTYA* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 64 . pep MRRFLPIAAICAXXLXXGLTAATGSTSSLA DYFWWIVAFSAM LLLVLSAVLARYVILLL K 
I I I I I I I ! I I I I 1 I I I I ! I I I ! I I I H I ! I I ! I I I II I I I ! I I ! I M I ! 1 I I I I I ! 
orf 64a MRRFLPIAAI CAVVLLYGLTAATGSTSSLA DYFWWIVAFSAM LLLVLSAVLARYVILLL K 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 64 . pep DRRDGVFGSXXAKXPXX XMFTLV&XLPGVFLFG FFAQFINGTINSWFGNDTHEALERSLN 
I I I I II II I II I I I I II II I II I I I I I I M I I I II M I I I I I I I I I I M I 

orf 64a DRRPGVFGSQIAKR-LS GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDTHEALERSLN 

70 80 90 100 110 



130 140 150 160 170 180 

orf 64 . pep L S K S ALN LAADN ALGN AV PVQ I DL I GAAS L PG DMGRVLE H Y AG S G F AQLAL YNXAS GKI E 
I I M I M I II M M I M : I I 1 I I I I I II I I I I I II I I I 1 II I It I I I II I I I I I I I I 
orf 64a LSKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 
120 130 140 150 160 170 



190 200 210 220 230 240 

orf 64 . pep KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 
II M I II I I I I I I M M I I I I I : I I I I I I I I I I I II I I Mill II I I I I I I I I M I 
orf 64a KSINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 
180 190 200 210 220 230 



250 260 270 280 290 300 

orf 64 . pep V PKG V AE D AV LIE KARAKY AE L S Y S KKG LQT F FLAT LL I AS LL S I FLALVMALY FARRFV 
I I I II M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I M I I I I I II I M I I I 

orf 64a V PKGVAE DAVLI EKARAXXXXL S Y SKKG LQT FFLAT LL I AS LL S I FLALVMALY FARRFV 

240 250 260 270 280 290 



310 320 330 340 350 360 

orf 64 . pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 
I M I I I I I I II I I I I I I I II I I I I II I II I I M I I I I I M I I I M I I : II M I I M M I 
orf 64a EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 
300 310 320 330 340 350 
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370 380 390 

ARHYLECVLEGLTTGVVVFDEQGCLKTFNKAAGT 
I I I I I I I I ( I I I I I I I 1 I M M I II I II I M I 

ARHYLECVLEGLTTG WVFDEQGCLKT FNKAAEQI LGMPLT PLWG S SRHGWHGVS AQQSL 
360 370 380 390 400 410 

LAEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQ 
420 430 440 450 4 60 470 

The complete length ORF64a nucleotide sequence <SEQ ID 253> is: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT AT GT CAT ATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTT CGCAG ATTGCCAAAC GCCTTTCCGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATCT GGCGGCAGAC AACGCCCTTG 

4 01 GCAACGCCAT CCCCGTGCAG AT AG AC NT C A TCGGCGCGGC TTCCCTGCCC 

4 51. NGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACAGGCGG GTTCGGTCAG GGATNNGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCANGGC TGGCTGTCGG CAGNNACGCA CAACGGGCGC GATTACGCCT 

7 01 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

7 51 ATCGAAAAGG CAAGGGCGNA ANANNNTNAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTNG CAACCCTGCT GATTGCCTCN CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AG AAG C AG AC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGACATTATC TCGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

13 01 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAAT CCTG 

13 51 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACNGCAACG GCGTGGTAAT 

14 01 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 
14 51 GGGGCGAAGT GGCAAAACGG CTGGCACACG AAATCCGCAA TCCGCTCACG 
1501 CCCATCCAGC TTTCTGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 
1551 GGACGAGCAN GACGCGCAAA TCCTGACACG TTCGACCGAC AC CAT CATC A 
1601 AACAAGTGGC GGCATTAAAA GAAATGGTCG AGGCATTCCG CAATTACNCG 
1651 CGTTCCCCTT CGNCTCAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 
17 01 CGATGTGTTG GCATTGTACG AAGCTGGTCC GTGCCGGTTT GCGGCGGAAC 
17 51 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 
1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 
1851 TGTGCCCGAA GTCAGGGTAA AAT CGGAAGC GGGGCAGGAC GGACGGATTG 
1901 TCCTGACAGT TTGCGACAAC GGCAAGGGGT TCGGCAGGGA AATGCTGCAC 
1951 AATGCCTTCG AGCCGTATGT AACGGACAAA CCGGCTGGAA CGGGATTGNG 
2001 ACTGCCCGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CNCATCAGCC 
2051 TGAGCAATCA GGATGCGGGC GGCGCGTNTG TCAGAATCAT CTTGCCAAAA 
2101 ACGGTAGAAA CTTATGCGTA G 

This encodes a protein having amino acid sequence <SEQ ID 254>: 



orf 64 . pep 
orf 64a 



1 MRRFLPIAAI CAVVLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 LARYVILLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERSLNL SKSALNLAAD NALGNAIPVQ IDXIGAASLP 

151 XDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QQAGSVRDXE SIGGVLYAXG WLSAXTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAXXXX LSYSKKGLQT FFLAT LLIAS LLSIFLALVM ALY FARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGWVFDE QGCLKTFNKA AEQILGMPLT 

4 01 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

4 51 LGKATVLPED NXNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEX DAQILTRSTD TIIKQVAALK EMVEAFRNYX 

551 RSPSXQLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLM MAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSEAGQD GRIVLTVCDN GKGFGREMLH 
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10 



15 



20 



25 



30 



651 NAFEPYVTDK PAGTGLXLPV VKKIIEEHGG XISLSNQDAG GAXVRIILPK 
701 TVETYA* 

ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 

10 20 30 40 50 60 

orf64a oep MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
* H lllillllllMIMIIMMMMIMIIIIMMIIIIIMMMMMiMMIMI 
orf 64-1 MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf64a pep DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
I | I 1 I I 1 I I I I 1 1 I I I I I I I I I I I M ! I i I I f I I f I 1 f i N ( i M I I I I I I M i I I I i i I 
orf64-l DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf64a pep S KS ALN LAADN ALGN AI P VQ I DX I G AAS L PXDMGRVLE H YAG S G FAQLAL YN AAS GK I E K 
| | | | | | | | j | i | I i I I : II I ! I I ! I II I I I I I I I I I M I t I t I I I I I 1 I t t t It t I I I 
or f 64-1 SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNAASGKIEK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf64a pep SINPHKLDQPF PGKARWEK I QQAG S VR DXE S I GG VL Y AX GW LS AXT HN GR D Y AL F FR Q P V 
| | | | I | I I I II I I I t t I I I I I : I I I I I I MINIMI Mill M M I II M M M M 
orf 64-1 SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 64a . pep PKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 
I t M II M M M M II M M I M M II M I M I M M I I M I II I M M I M M i I 
orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 



35 



310 320 330 340 350 360 

orf 64a. pep PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
M I M M M M M I M I I II I M I M II I I I II I M II 1 I I I i II II i I M M II M M I 
orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 



40 



45 



50 



55 



370 380 390 400 410 420 

orf 64a. pep RHYLECVLEGLTTGWVFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

I M II I M M I II M I II II I M I II I M M M M M M M I M I M I M M II M M M 
orf 64-1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 64a . pep AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQK 
M M II M II II I M II M I M I M M M M M II M M M M I I M M M M M M M 
orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 64 a. pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEXDAQILTRSTDTIIKQVAALK 

II M I I I II I I I I li I I I II M I I I I I I I II I M I I i I I II I I II I I I I I I M I I II I I 
orf 64-1 EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 



60 



550 560 570 580 590 600 

orf 64a . pep EMVEAFRNYXRSPSXQLENQDLNALIGDVLALYEAGPCRFAAELAGEPLMMAADTTAMRQ 
I I M M M I MM M II M I I II M I II I II II II II I M M II M I : I II M M M 
orf 64-1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 



65 



610 620 630 640 650 660 

orf 64a . pep VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
I M I M I II II M M I I M M II I M M I M II M I I I I I I I I II M II I M II M M M 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 
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670 680 690 700 

orf 64a . pep PAGTGLXLPWKKIIEEHGGXISLSNQDAGGAXVRIILPKTVETYAX 
I I I I I I I i I I I I II I I I I I I I I I I I M I I I 111111111:1111 
orf 64-1 P AGTGLGL P WKK 1 1 EE HGGRI S L SN QDAGGAC VR 1 1 LPKT VKT YAX 

670 680 690 700 

Homology with a predicted ORF from N. gonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 



10 gonorrhoeae: 

orf 64 .pep 



15 



20 



25 



30 



35 



orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 -pep 
orf 64ng 
orf 64 -pep 
orf 64ng 
orf 64 .pep 
orf 64ng 



MRRFLPIAAICAXXLXXGLT7AATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

M | I I I I It I I t I I I I 1 I M I I I I I I I I I 1 I I I : I M I I I II I I I I I I I I I I I I I I 
MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 



60 



60 



120 



DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 
ilClim || II I I I I I I I: i I II: II I I I I I I I II I II I I I I I I I I 11 I 

DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 119 

LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 180 
I I I I I I : I I I I I I : : i I I I I I I I I i i : I II I : I I I I I I I I M I I I I I I I I I I I I I I i 
LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 17 9 

KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 240 
I II I i I : : I f I : I I : I I : I I :: I I I I : 11 I I i I I I II i I I I I I I I I I ! I I I I I I II I 
KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 23 9 

VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYF7ARRFV 300 
: 1 : : I I : I I i I ! 1 I 1 I I I I 1 I I I I I I I I I I I I I I : I 1 I I I I I I II 1 I I I I I 1 I I I I 11 1 I 
IPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRFV 299 

EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 360 
I I : I I I I I I I I II 1 I I I I I I I I I M 1 II I 1 I M I II I I I 1 I I t I I I I : I I I I I I I I 1 I I 
EPILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 359 

ARHYLECVLEGLTTGVVVFDEQGCLKTFNKAAGT 3 94 

I I 1 I I I I I I : I I I I I II I : I : I 

ARHYLECVLDGLTTGVWSYPLSCCRTAVFSTCHSSPLSYF 400 



An ORF64ng nucleotide sequence <SEQ ID 25 5> was predicted to encode a protein having amino 
acid sequence <SEQ ID 256>: 



40 



45 



l 

51 
101 
151 
201 
251 
301 
351 



MRRFLPIAAI CAVVLLYGLT AATGSTSSLA 



LARYVILLL K 
TINSWFGNDT 
GNMGSVLEHY 
QQTGSVRSLE 
IEKARAKYAE 
PILSLAEGAK 
ERNRRREEAA 



DRRNGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 



IAKRLSGMFT 



DYFWWIVSFS 
LVAVLPGLFL 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



RPVLRNDEFG 
LTTGVWSYP 



RLTKLFNHMT 
LSCCRTAVFS 



AM LLLVLSAV 
FGISAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
TCHSSPLSYF* 



Further work revealed the complete gonococcal DNA sequence <SEQ ED 25 7>: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGCGCCGCT 
CGGATTGACG 
GGTGGATAGT 
TTGGCACGTT 
CGGTTCGCAG 
TACTGCCCGG 
ACGATTAATT 
CCTTAATTTG 
GCAACGCCGT 
GGCAAT AT GG 
GCTTGCCCTG 
CGCACCAATT 



TCCTACCGAT 
GCGGCGACCG 
CTCGTTCAGC 
AT GT CAT ATT 
ATTGCCAAAC 
CTTGTTCCTG 
CGTGGTTCGG 
AGCAAGTCCG 
TCCCGTACAG 
GCAGTGTGCT 
TAC7AATGCCG 
CGACCAGCCG 



CGCAGCCATA 
GCAGCACCAG 
GCAATGCTGC 
GCTGTTGAAA 
GCCTTTCCGG 
TTCGGCATTT 
CAACGACACC 
CACTGGATTT 
ATAGACCTCA 
GGAACACTAC 
CAAGCGGGAA 
CTTCCCGACA 



TGCGCCGTCG 
TTCGCTGGCG 
TGCTGGTGTT 
GACAGGCGCA 
GATGTTCACG 
CCGCGCAGTT 
CACGAAGCCC 
GGCGGCAGAC 
TCGGCACCGC 
GCCGGCAGCG 
AATCGAAAAA 
AAGAACATTG 



TCCTGCTGTA 
GATTATTTCT 
GTCCGCCGTT 
ACGGCGTGTT 
CTGGTCGCCG 
TATCAACGGC 
TCGAACGCAG 
AATGCCGTCA 
CTCCCTGTCG 
GTTTTGCCCA 
AGCATCAATC 
GGAACAGATT 
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601 CAGCAGACCG GTTCGGTTCG GAGTTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGA TGGTTGTCGG CAGGTACGCA CAACGGGCGC GATTACGCGC 

701 TG.TTCTTCCG CCAGCCGATT CCCGAAAATG TGGCACAGGA TGCCGTTCTG 

751 ATTGAAAAGG CGCGGGCGAA ATATGCCGAA TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTTCTGG TAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC GCTGGTAATG GCACTGTATT TTGCCCGCCG TTTCGTCGAA 

901 CCCATTCTGT CGCTTGCCGA GGGCGCAAAG GCGGTGGCGC AGGGTGATTT 

951 CAGCCAGACG CGCCCCGTAT TGCGCAACGA CGAGTTCGGA CGTTTGACCA 

1001 AGCTGTTCAA CCATATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAACGCAACC GCCGGCGCGA GGAAGCCGCC CGTCACTACC TCGAGTGCGT 

1101 GTTGGATGGG TTGACTACCG GTGTGGTGGT GTTTGACGAA AAAGGCCGTT 

1151 TGAAAACCTT CAACAAGGCG GCGGAACAGA TTTTGGGGAT GCCGCTCGCC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TtgccgccAT CGGTGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCAGGTGGAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CGACGGTATT GCCCGAAGAC AACGGCAACG GCGTGGTGAT 

14 01 GGTGATTGAC GACATCACCG TGCTGATACG CGCGCAAAAA GAAGCCGCGT 

14 51 GGGGTGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGATCAG GACGCGCAAA TCCTGACGCG TtcgACCGAC AC CAT CAT CA 

1601 AACAGgtggc gGCGTTAAAA GAAATGGTCG AGGCATTCCG CAATTACGCG 

1651 CGCGCCCCTT CGCTCAAACT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTTTTG GCCCTGTACG AAGCCGGCCC GTGCCGGTTT GAGGCGGAAC 

17 51 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TATGCCCGAA GTCAGGGTAA AATCGGAAAC GGGGCAGGAC GGACGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAGGGAT TCGGCAAGGA AATGCTGCAC 

1951 AATGCTTTCG AGCCGTATGT GACGGATAAG CCGGCGGGAA CGGGACTGGG 

2001 TCTGCCTGTA GTGAAAAAAA TCATTGGAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGGGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAGAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS AM LLLVLSAV 

51 LARYVILLL K DRRNGVFGSQ IAKRLS GMFT LVAVLFGLFL FGI SAQFING 

101 TINSWFGNDT HEALER SLNL SKSALDLAAD NAVSNAVPVQ IDLIGTASLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLSIFLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGVWFDE KGRLKTFNKA AEQILGMPLA 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVQVE YAAPDDAKIL 

451 LGKATVLPED NGNGVVMVID DITVLIRAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDDQ DAQILTRSTD TIIKQVAALK EMVEAFRNYA 

551 RAPSLKLENQ DLNALIGDVL ALYEAGPCRF EAELAGEPLM MAADTTAMRQ 

601 VLHNIFKMAA EAAEEADMPE VRVKSETGQD GRIVLTVCDN GKGFGKEMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKIIGEHGG RISLSNQDAG GACVRIILPK 

701 TVETYA* 

ORF64ng-l and ORF64-1 show 93.8% identity in 706 aa overlap: 



10 20 30 40 50 60 

orf 64ng-l . pep MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 
I I M I II I I I I I I I I I I I I I I I I 1 1 I M I I I I I I I I !: I I I I It I ) I I I I 1 I I M I I I M 
orf 64-1 MRRFLPIAAICAVVLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 64ng-l . pep DRRNGVFGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNL 
I I I : I I I M I I I I I I I I I I I I I I II I I : I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 64-1 DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf64ng-l.pep SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 
I I I I I : I I I ! I I : : I I I I I I I I I I I : I I I I : I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 64-1 SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 

130 140 150 160 170 180 
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190 200 210 220 230 240 

orf 64ng-l . pep SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 
I I I I I :: I I I : I I : I I : I I :: I 1 f I : I 1 I I I I I I I I I I I M I I I I I I I I I I HI I I 1 : 
orf 64-1 SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 64ng-l . pep PENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRFVE 
I :: M : I I I I I I I I I I I I I I I I I I I I 1 I I I I M : I I I I I I I I I I I i I I i I I 1 I I I I I I I I 
orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 



15 



310 320 330 340 350 360 

orf 64ng-l.pep PILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
I : I 1 t 1 t 1 1 I t 1 ) I I I I I I I I 1 I t M I I 1 I I I I I 1 I I I I I M I I I I II I 1 I I I I I I I I I 1 
orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 



20 



370 380 390 400 410 420 

orf 64ng-l . pep RHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 

iiiiiMi:iiiiiiiii!i:i iiiiiiii!Mimf:immi!tiiimim 

orf 64-1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
370 380 390 400 410 420 



25 



30 



35 



430 440 450 460 470 480 

orf 64ng-l . pep AE V FAA I GAAAGT DKP VQVE YAAP D DAK I LLGKAT VLPE DNGNG WMV I D D I T VL I RAQK 
I M I I I t t ! I t t I t I I t : ! : I I I t 1 I 1 I I I I I I I I I I I t I II t t I I I I I I I I t I I t : 1 I t 
or f 6 4 - 1 AEVFAAI GAAAGT DKPVHVKYAAPDDAKI LLGKATVLPEDNGNGWMVI DDIT VLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf64ng-l.pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTRSTDTIIKQVAALK 
I t II II I I I I I I I I I I II I i I I I It i I I I I I I I 1 I I I I : I I I I I I I I I I I I I : I I I I I I I 
orf 64-1 EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 
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550 560 570 580 590 600 

orf 64ng-l . pep EMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGEPLMMAADTTAMRQ 
II I I I II II I I : I I II I I I I I I I I I I II I I I I I 1 I I I I I I I I I I I I I I : I I I I I I I I I 
orf 64-1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 



45 



610 620 630 640 650 660 

orf 64ng-l . pep VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGFGKEMLHNAFEPYVTDK 
I I I I I I I i I II I I I I I I : I I I I I I ! I II I I I I M I I I I! I ! I I I I : I I I I II I I I I I I II 
orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
610 620 630 640 650 660 



50 



55 



60 



65 



670 680 690 700 

orf 64ng-l . pep PAGTGLGLPWKKIIGEHGGRISLSNQDAGGACVRIILPKTVETYAX 
I I I I I I I I I I I I I I I I I I I I II I I i I I I I I I I I I I 1 I M I I : i I I I 
orf 64-1 PAGTGLGLPWKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 

670 680 690 700 

Furthermore, ORF64ng-l shows significant homology to a protein from A.caulinodans: 

sp|Q04850|NTRY_AZOCA NITROGEN REGULATION PROTEIN NTRY >gi | 7747 9 | pir | | S18624 ntrY 
protein - Azorhizobium caulinodans >gi|38737 (X63841) NtrY gene product 
[Azorhizobium caulinodans] Length = 771 
Score = 218 bits (550), Expect = 7e-56 

Identities = 195/720 (27%), Positives = 320/720 (44%) , Gaps 58/720 (8%) 

Query: 7 IAAICAWLLYGLTAATGSTSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 66 

I+A+ ++L GLT + + + R + + K R G 

Sbjct: 35 ISALATFLILMGLTPVVPTHQVVIS VLLVNAAAVLILSAMVGREIWRIAKARARGR 90 

Query: 67 FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 126 

+++ R+ G+F +V+V+P + + +++ ++ ++ WF T E + S++++++ + 
Sbjct : 91 AAARLHIRIVGLFAWSWPAILVAVVASLTLDRGLDRWFSMRTQEIVASSVSVAQTYVR 150 
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Query: 127 LAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAG-- SGFAQLALYNAASGKIEKSINP 184 

AN+ + +DL S+ YGSFQ+ AA+++ 

SbjCt: 151 EHALNIRGDILAMSADLTRLKSV YEGDRSRFNQILTAQAALRNLPGAMLI 200 

5 Query: 185 HQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYA 233 

+ D++++ I+ v + +IG Q + N DY 

Sbjct: 201 RR-DLSVVERAN-VNIGREFIVPANLAIGDATPDQPVIYLP — NDADYVAAWPLKDYDD 256 

Query: 234 --LFFRQPIPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTXXXXXXXXXXXXXVMA 291 
10 L+ + I V ++ A Y L + G+Q F + + 

Sbjct: 257 LYLYVARLIDPRVIGYLKTTQETLADYRSLEERRFGVQVAFALMYAVITLIVLLSAVWLG 316 

Query: 2 92 LYFARRFVEPILSLAEGAKAVAQGDFSQTRPVLRND-EFGRLTKLFNHMTEQLSIXXXXX 350 
L F++ V PI L A VA+G+ P+ R + + L + FN MT +L 

15 Sbjct: 317 LNFSKWLVAPIRRLMSAADHVAEGNLDVRVPIYRAEGDLASLAETFNKMTHELRSQREAI 376 

Query: 351 XXXXXXXXXXXHYLECVLDGLTTGWVFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGW 410 

+ E VL G+ GV+ D + R+ N++AE++LG L+ + RH 
Sbjct: 377 LTARDQIDSRRRFTEAVLSGVGAGVIGLDSQERITILNRSAERLLG — LSEVEALHRHLA 434 

20 

Query: 411 HGVSAQQSLLAEVFXXXXXXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG NGWM 467 

V LL E + VQ D + + V E + +G V+ 

Sbjct: 435 EWPETAGLLEEA EHARQRSVQGNITLTRDGRERVFAVRVTTEQSPEAEHGWVV 488 

25 Query: 468 VIDDITVLIRAQKEAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTR 527 

+DDIT LI AQ4- +AW + VA+R4-AHE I +N PLT P I QLS AERL KG + QD +1 + 
Sbjct: 489 TLDDITELISAQRTSAWADVARRIAHEIKNPLTPIQLSAERLKRKFGRHV-TQDREIFDQ 547 

Query: 528 STDTIIKQVAALKEMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGE 587 
30 TDTII+QV + MV+ F ++AR P +++QD++ +1 + L G + 

Sbjct: 548 CTDTIIRQVGDIGRMVDEFSSFARMPKPWDSQDMSEIIRQTVFLMRVGHPEVVFDSEVP 607 

Query: 588 PLMMAA- DTTAMRQVLHNI FKNXXXXXXXXDMPEVRVK SETGQDGRIVLTVCD 639 

PMA D -t-QLNIKN P+VR + + G+D +V+ + D 

35 Sbjct: 608 PAMPARFDRRLVSQALTNILKNAAEAIEAVP-PDVRGQGRIRVSANRVGED — LVIDIID 664 

Query: 640 NGKGFGKEMLHNAFEPYVTDKPAGTGLGLPVVKKIIGEHGGRISLSNQDAG-GACVRIIL 698 

NG G +E + EPYVT + GTGLGL +V KI+ EHGG I L++ G GA +R+ L 
Sbjct: 665 NGTGLPQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 



40 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



45 Example 31 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 259>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

50 151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

55 401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

451 CACGCGTTGG ATACG . . . 

This corresponds to the amino acid sequence <SEQ ID 260; ORF66>: 



1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 
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51 FI FLAT D LTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 
101 LSEFNTFVGR IALASFAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 
151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ID 26 1>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

451 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACA ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA IGQILDIFV F NKLRRLKAWW IAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o221 ofE. coli (accession number P37619) 
ORF66 and o221 protein show 67% aa identity in 155aa overlap: 



orf66 1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

M F+ Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFP FIFLATDLTV 

o221 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

orf66 61 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 

o221 61 R I FGAPLARR 1 1 FAVM I PAL LISYVISSL FYMGS WQG FGALAH FNL FVAR I AT AS FMAYA 120 



orf66 121 IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 
o221 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 66 . pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFS FPFIFLATDLTV 
j | f I I I I I I I I i I I i I I I f I I ! I I I I i I ! I M t I I I I II I I I I II I t I I I I I II I I I I 
or f 6 6a MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFS FPFIFLATDLTV 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf66,pep RI FGSHLARR I I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI ALAS FAAYA 
I II II i I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I t I I I I M I I I I I I I I I I I I I I 
orf 66a RI FGSHLARR I I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRI A LAS FAAYA 

70 80 90 100 110 120 



orf 66 .pep 



130 140 150 

IGQILDIF VFNKLRRLKAWWIAPNAS TVIGHALDT 
: I I II I I I I I I II I II I I II : I i : I I I I M : I I I t 
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orf 66a T.r.nTr.nTrvFNKT.RRT.KAWWVAPTAS TVIGNALDTLVFFAVAF YASSDGFMAANWQGIAF 
130 140 150 160 170 180 

orf66a ypYLFKLT VCGLFFLFAYGVILNLL TKKLTTLQTKQAQDRPAPSLQNPX 
190 200 210 220 

The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 264>: 

1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFS FP 

51 F I FLAT D LTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR IA LASFAAYA LGQILDIFV F NKLRRLKAWW VAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VPYLFKLT VC GLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 

10 20 30 40 50 60 

orf66a.pep MYAFTAAQQQKALFWLVLFHILI IAASNYLVQFPFQI SGIHTTWGAFS FPFI FLATDLTV 
1 I I I I I I I I I I I I I I 1 t I I I I I I 1 I I I I I i I I I I M I I I I I I I I I ! I I I I I I I I I I I I 
orf 66-1 MYAFTAAQQQKALFRLVLFHILI IAASNYLVQFPFQI FGIHTTWGAFSFPFI FLATDLTV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 66a . pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 
I I t t I 1 I I I I I i I I t I I I I I 1 I M I I I I I I I I t I I I 1 I I I 1 N I I I I I I 11 I I 1 I M I 1 I 
orf 66-1 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 66a. pep LGQILDIFVFNKLRRLKAWWVAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 

; I M I I I I I I I I I I ! I I I M : I I I I I I I I 1 I I I II i I I I I M I I I i I I i I II I i I I I I t i 
orf 66-1 IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
130 140 150 160 170 180 

190 200 210 220 229 

orf 6 6a. pep VDYLFKLTVCGLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 
I I 1 I I I I M I I I I I I II I I I I I I I 1 I I I II I I I I I I I I 1 I M I M I I I 
orf 66-1 VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 

190 200 210 220 

Homology with a predicted ORF from N. gonorrhoeae 

ORF66shows 94.2% identity over a 155aa overlap with a predicted ORF (ORF66.ng) from N. 

gonorrhoeae: 

orf 66. pep MYAFTAAQQQKALFRLVLFHILI IAASNYLVQFPFQI FGIHTTWGAFSFPFI FLATDLTV 60 

j I ! : I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I ! : I I I I I I I I I I I I I I I I I I I I 1 I I I 
orf66ng M YALTAAQQQKALFRLVLFH I LI I AASN YLVQFP FRI FG I HTT WGAFS FPFI FLAT DLT V 60 

orf 66. pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I : II I I I I I I I I I I I I I II 
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RIFGSHLARRIIFWVMFPALSLSYVFSVLFHNGSWTGLGAPSQFNTFVGRIALASFAAYA 120 

155 



orf 66ng 
orf 66 -pep 
orf 66ng 

The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 



IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 
: { | | | 1 | j I i : I I 1 i I I i I I I t I I I II II : I I 1 I 

LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGTACGCAT 
GCTTTTCCAT 
CCTTCCGGAT 
TTCATCTTCC 
GGCGCGGCGG 
aCGTCTTTTC 
ctgTCCCAAT 
CGCCTACGCG 
GCCGTCTGAA 
AATGCACTGG 
CGATGAATTT 
TGTTCAAACT 
ATACTGAATC 
GCAAGACCGC 



TGACCGCCGC 
ATCCTCATCA 
TTTCGGCATC 
TCGCCACCGA 
ATTATCTTTT 
CGTTTTGTTC 
TCAACACCTT 
CTCGGACAAA 
AGCGTGGTGG 
ACACGTTAGT 
ATGGCGGCAA 
TACCGTCTGC 
TGCTGACGAA 
CCCGTGCCCT 



ACAGCAACAG 
TCGCCGCCAG 
CACACCACTT 
CCTGACCGTC 
GGGTGATGTT 
CACAACGGCA 
TGTCGGACGC 
TCCTTGATAT 
ATTGCCCCGG 
ATTTTTTGCC 
ACTGGCAGGG 
ACCCTCTTCT 
AAAACTGACG 
CGCTGCAAAA 



AAGGCACTCT 
CAACTATCTG 
GGGGCGCGTT 
CGCATTTTCG 
CCCCGCCCTT 
GTTGGACGGG 
ATCGCGCTGG 
TTTCGTATTC 
CCGCATCAAC 
GTTGCCTTTT 
CATCGCTTTT 
TCCTGCCCGC 
GCCCTGCAAA 
TCCGTAA 



TCCGGCTGGT 
GTGCAGTTCC 
TTCCTTTCCC 
GTTCGCACTT 
ttgCTTTcat 
CTTGGGCGCG 
CAAGTTTTGC 
GACAAATTAC 
CGTCATCGGC 
ACGCAAGCAG 
GTCGATTACC 
CTACGGCGTG 
CCAAACAGGC 



This encodes a protein having amino acid sequence <SEQ ID 266>: 



1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFSFP 

51 FIFLATDLTV R IFGSHLARR IIFWVMFPAL SLSYVFSVLF HNGSWTGLGA 

101 PSQ FNTFVGR IALASFAAYA LGQILDIFVF DKLRRLKAWW IAPAASTVIG 

151 NALDTLVFFA VA FYASSDEF MAANWQGIA F VDYLFKLTVC T LFFLPAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

An alternative annotated sequence is: 



1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSQFNTFVGR I ALASFAAYA LGQILDIFV F DKLRRLKAWW IAPAAS TVIG 

151 NALDTLVFFA VAF YASSDEF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 



orf 66-1 . pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

I I) : I I I I I II I I i I I I M I t I I I I I I I I I I I I I I = I I I I I I t I I I I 1 I M I I I I I I I I I 
orf 66ng M YALT AAQQQKAL FRLVL FH I L 1 I AASN YLVQ FF FR I FG I HT T WGAFS FP FI FLAT D LT V 60 

orf 66-1 .pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

I I I I I I I I I I I I I I I I 1 II I I I I I II I I I I I I II I I I I I I I I : I I I I I I I I I 1 I I I I I I I 
orf66ng RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRIALASFAAYA 120 

Orf 66-1 . pep IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 180 

: I I I I I I I I I : I I I I I I I I I I I I : I I I I I I ! I I I I I I I I I I I I I I I I I I 1 I 1 I I I I I I I 
orf 66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

orf 66-1 . pep VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRFAPSLQNPX 229 

I II I I I I I I I I I I I I I I 1 I I I I I I I I I II I : I I I I I I I 1 I I : I I M I I I 
orf66ng VDYLFKLTVCTLFFLPAYGVILNLLTKKLTALQTKQAQDRPVPSLQNPX 229 

Furthermore, ORF66ng shows significant homology with an E.coli ORF: 



sp|P37 619|YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 
REGION (0221) 

>gi 1 1073495 Ipir | I S47690 hypothetical protein o221 - Escherichia coli >gi!466607 
(U00039) No definition line found [Escherichia coli] >gi 11789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 
Length = 221 

Score = 273 bits (692), Expect = 5e-73 

Identities = 132/203 (65%), Positives = 155/203 (76%) 

Query: 1 MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 
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M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFLATDLTV 
Sbjct: 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

Query 61 RIFGSHLARRIIFWMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRIALASFAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
Sbjct: 61 R I FG APLARR 1 1 FAVM I PAL LISYVISSL FYMG S WQGFGALAH FNL FVAR I ATAS FMAYA 120 

Query 121 LGQ I L D I FV FDKLRRLKAW W I AP AAS T V I GN ALDT LV FFAVAFYAS S DE FMAANWQG I AF 180 

LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 
Sbjct: 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 180 

Query: 181 VDYLFKLTVCTLFFLPAYGVILN 203 

VDY FK+ + +FFLP YGV+LN 
Sbjct: 181 VDYCFKVLI S I VFFLPMYGVLLN 203 

Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 32 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 267>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 

351 CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 

451 TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 

501 TGGCTGCTAC GGCGTTGAT . . 

This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

4 51 TAA 

This corresponds to the amino acid sequence <SEQ ID 270; ORF72-l>: 

1 MVIKYTNLNF AKL5IIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 

151 * 
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Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf72 pep MVIKYTNLNFAKLSIIAILMMYSFEANAN AVXISETVSVDTGQGAKIHKFVPKNSKTYSS 
MllilliMilMlMilMIMttii! I I I I I 1 I I I I I I I I I I I I I 1 I I I I 1 I I I I I 
or f72a MVIKYTNLNFAKLS 1 1 AILMMYS FEANA NAVKI SETVSVDTGQGAKIHKFVPKNSKTYS S 

TO 20 30 40 50 60 

70 80 90 100 110 120 

orf72 pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
| M I 1 ( i I I I I 1 I f I I I 1 f i I I I I I I I I i t M I I I I I i ! i I II t i ! t I i I I ! I i I t I I I 
orf72a DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 150 160 170 

orf 72 . pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 

I ! I I I M I 11 M I I I I i 1 I I II 1 i I : t 
o r f 7 2 a HDVYET FKE DIQARG YQYDPET DKFAKVSGX 

130 140 150 

T3he complete length ORF72a nucleotide sequence <SEQ ID 271 > is: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This encodes a protein having amino acid sequence <SEQ ID 272>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYET FKE D IQARGYQYDP ETDKFAKVSG 
151 * 

ORF72a and ORF72-1 show 100.0% identity in 150 aa overlap: 

10 20 30 40 50 60 

orf 72a . pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
I I ( ! I I I I t I I I I I M I I I I I I I I I II I I I ! I I II I 1 I I I I I I I I I I I II I I I I I I I I I I 
orf 72-1 MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72a . pep DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I I 1 II I I I I II II I II I 1 I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 1 I I I I I I 
orf 72-1 DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 150 

orf 72a . pep HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
I I I I I I I I I I I II I I I I I I II II I I I ! I I II 
orf 72-1 HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
130 140 150 
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Homology with a predicted QRF from N. gonorrhoeae 

ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from N. 
gonorrhoeae: 

orf72 pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 60 

|| Irlimillllflliliinillllll lll!:ltliltltl:l!llll:l: III 
orf72ng MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 60 

orf72 pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 120 

M hllili llltllltlliltlllllllll:lllll:| 1111:1111111111111 
orf72ng DLTKAVDLTH I PTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 120 

orf72 pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 173 

I I I I I I I I I t I I II I :tlllllllllllll:IIIMII:liliillliillf 

orf72ng HDVYET FKE DIQARGCRYDPETDKFVKGYE YANCLWYEDERRINRT YGCYGVDS S IMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYET FKE D IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSS IMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGNPVQVA ATFGRDAQGN TTADVQVIPR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

4 01 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

501 EKIR FAVLLA FIIMSAFWF G SLGGE* 

After further analysis, the following gonococcal DNA sequence <SEQ ID 275> was identified: 

1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 

This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYET FKE D IQARGCRYDP ETDKF 

ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 

10 20 30 40 50 60 

orf72ng-l .pe MVTKHTNLN FAKLS 1 1 AI LMM YS FEANANAVKI SETLS VDTGQGAKVHKFVPKS SN I YS S 

II I : I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I : III 
orf72-l MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72ng-l . pe DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 
II I : I I I I I I I I I I I I I I I I I I ! I I I I I II I I : I I I II : I I I I I : I I I I I I I I I I I I I 
orf72-l DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
70 80 90 100 110 120 



orf72ng-l 



130 140 
pe HDVYETFKEDIQARGCRYDPETDKF 
M I I I I I II I I I I II : I I I I I I I I 
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orf 72-1 HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
130 140 150 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
K gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 33 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 277>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 GCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC. . 

This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI.. 

Further work revealed the complete nucleotide sequence <SEQ ID 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

4 01 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 

151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 73 . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAA GVLMLRQTGLTGLLLAGAA 
I I I I I I t f t I I ! I f I I I I I I II I I i I I i I I I I I I I f I I I I I I : I I I : t I I : I I ! I I I I I 
orf 73a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAA GVVMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 

orf 73 .pep MRSGGKVSVYQMLWPI 
I I I I I : II I I III I 

orf 7 3a MRSGGRVSVYXMLWXIRYTVAAVC XMSPGFVSSVXAVLLXL PFKGGAVLQAGGAENFFNM 
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The complete length ORF73a nucleotide sequence <SEQ ID 281> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

5 151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

IQ 401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

451 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ ID 282>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVVMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

15 101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 

10 20 30 40 50 60 

orf73a pep MRFFGIGFLVLLFLE IMS I VWVADWLGGGWTLFLMAATFAAGWMLRHTGLSGLLLAGAA 
20 I II i ill I I I I 1 II i II I f I I i I I I H I I Ml I i I t i I I I I i : I t i I I I I 1 i M I III I 

orf 73-1 MRFFGIGFLVLLFLE IMS IVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

25 orf 7 3a . pep MRSGGRVSVYXMLWXIRYTVAAVCXMSPGFVSSVXAVLLXLPFKGGAVLQAGGAENFFNM 

I I M I I I I II III I I I I I I I I I I I I I I I I I I Mil II I I M I II I M M M M M 
orf 7 3-1 MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 
70 80 90 100 110 120 

30 130 140 150 160 

orf 73a. pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 
I MM I I M M I I II II II I MM I Ml I M I I M 
orf 73-1 NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
130 140 150 160 

35 

Homology with a predicted ORF from N. gonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 

orf 73 .pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 60 
40 I I I M M I I M I I II I M II I I M II I M I M II II I II I II I M I M I M M I I M M 

orf73ng MRFFGIGFLVLLFLE IMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 60 

orf 7 3. pep MRSGGKVSVYQMLWPI 7 6 

: :\ :\ II I M II I M I 

45 orf 7 3ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 120 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

50 151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

201 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 GTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

55 401 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

4 51 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 
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1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 

51 LSGLLLAGAA VKSSGKVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93,8% identity in 161 aa overlap 



10 20 30 40 50 60 

orf7 3-l.pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 
I | i i I t I I ! I I t I i I I i I I I i 1 i I I M i I M i I I I I I M 1 ! I I I i I t ! I I I M I i I I 1 I 
orf73ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 73-1. pep MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 
:: I : I : I I i I I I I I I I i i I M I I i 1 I I I I I I 11 I I f II i t I I i I I I I I I I f I If i i I i I I 
orf73ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 



130 140 150 160 

orf 7 3-1. pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
I I 1 I 1 I 1 I I : I I I I M I 1 I 1 I I : I 11111111111:1111 
orf7 3ng NQSGRKEGFFHDDDIIEGEYTVEKPDGGNRSRNAIEHEKDEX 

130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 34 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 285>: 



1 ATGTTTGTTT TTCAGACGGC ATTCTT.ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

151 GCG GCCGA AGACACGCGC GTTACCGCAC AGCTTTTGAG 

201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

351 ACTCGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

401 GCGCAAC.GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 

451 GATTTTTATT TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 

501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 

701 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

751 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACATCATGA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG T AC GAT . . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 



1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A. . . . AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD.. 



Further work revealed the complete nucleotide sequence <SEQ ID 287>: 
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1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACCG 
CAGTGTGCGC 
ATCTTTCAGA 
GCCGTGTGCG 
GTTTAAAGTC 
GCGTGGCCGG 
CCGAAATCGG 
GTTTCCTATC 
CCGATATGGC 
ATTACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAACATCA 
GGAGCTTGCT 
TGGCTCTGTC 



AACATTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CACAGCTTTT 
GAACACAACG 
CGGCATGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGAAGGA 
GAGAACGCAG 
GTCATGTTTG 
GGAACTGTTC 
CGTTTGAAAC 
TCTGCCGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAAATCA 
TTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AACGGCAGAT 
GTGGCACAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAACTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
CACAGCCGAG 
CGGGCGAGGG 
AAATAG 



GACAGCGTCG 
TTTGGCGGAC 
TCATCTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTGATG 
ATTTCAACGG 
GCCAAATGGG 
CCGCATCGGT 
GATTAATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
AAAGAAAGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAAACTCGT 
ATTGTCGGCT 
GGGTACGCCG 
GTGAGGCCGG 
GCGGCTTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCGACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCCGCG 
AACAGGCGGC 
TTGTACGATC 



This corresponds to the amino acid sequence <SEQ ID 288; ORF75-1: 



1 MFQKHLQKAS 

51 RVTAQLLSAY 

101 AVCDPGAKLA 

151 PKSGERRKLF 

201 ITKTFETFLS 

251 QNIMKILTAE 



DSVVGGTLYV 
GIQGKLVSVR 
RRVREAGFKV 
AKWVRAAFPI 
GTVGEIQTAL 
LPTKQAAELA 



VATPIGNLAD 
EHNERQMADK 
VPVVGASAVM 



ITLRALAVLQ 
IVGYLSDGMV 
AALSVAGVEG 



VMFETPHRIG 
SADGNQSRGE 
AKITGEGKKA 



ATLADMAELF 
MVLVLYPAQD 
LYDLALSWKN 



KADI I CAE DT 
VAQVSDAGTP 
SDFYFNGFVP 
PERRLMLARE 
EKHEGLSESA 
K* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A of K 



meningitidis: 



orf 75 .pep 
orf75a 

orf 75. pep 
orf 75a 

orf 75. pep 
orf 75a 

orf 75. pep 
orf 75a 

orf 75 .pep 
orf 75a 



10 20 30 40 50 60 

MFVFQTAFXMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKAXXXXAEDTR 
t II 1 I I I I I I It I i I I I I 1 I I I I ! I I t I I II I I I I I I I I I 1 I I I I I I 

MFQKHLQKASDSVVGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTR 
10 20 30 40 50 

70 80 90 100 110 120 

VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
I I I 1 I I I I I II I I I II I I II I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
60 70 80 90 100 110 

130 140 150 160 170 180 

RVREAGF KVVPWGAXAVMAALSVA GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 
1111:1111111111 I I I I I I I I I I I I I I 1 I I I I I I II I I I I II I I I I I I II : I I I : I 
RVREVGFK WPVVGASAVMAALSVA GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPVV 
120 130 140 150 160 170 

190 200 210 220 230 240 

MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 
I I I I I I I I I I : I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I : I I I : I I I I 1 I 
MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 
180 190 200 210 220 230 

250 260 270 280 290 

VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 
240 250 260 270 280 290 



orf75a X 

The complete length ORF75a nucleotide sequence <SEQ ID 289> is: 
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1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGCGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGTCGG 

351 GTTTAAAGTT GTCCCTGTTG TCGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GTGTGGCTGG TGTGGCGGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGTGGC 

501 GTTTCCCGTC GTGATGTTTG AAACGCCGCA CCGCATCGGG GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCC GCCAAAATCA CGGGCGAGGG AAAAAAAGCT TTGTACGATC 

851 TGGCACTGTC TTGGAAAAAC AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 290>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREVGFK V VPWGASAVM AALSVA GVAG SDFYFNGFVP 

151 PKSGERRKLF AKWVRVAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75a and ORF75-1 show 98.3% identity in 291 aa overlap: 

10 20 30 40 50 60 

orf 75a . pep M FQKHLQKAS D S WGGT L Y WAT P I GN LAD I T LRALAVLQKAD 1 1 CAE DTRVT AQLL S AY 
M II t I I I t f I I I i i f i II I ( I I I t i I I I II I II I i I 1 f I I i II I I I I ! II I I it i M ( t 
orf 75-1 MFQKHLQKAS DS WGGT LYWATP I GN LAD IT LRALAVLQKAD 1 1 CAE DTRVT AQLL SAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 5a . pep GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREVGFKV 
I II I I I I I I II I I I I I I I I I I I I I I I M I I I I II I I i I I I I I I I II I ! I I I I I I I : i I I I 
orf 7 5-1 GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 5a. pep VPWGASAVMAALSVAGVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIG 
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II 1 I I I I II I I : I I I : M I I I I I I I I 
orf 75-1 VPWGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPI VMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 75a . pep m ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
I I I II I I I I I I i I I I I I I I I I I I I I I M I I I I I I I I I I I h I I I I I I I I I I I I I I I I I I I 
orf 75-1 ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 

250 260 270 280 290 

or f 7 5a . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
I M I I I I I II I I I I I II I I I I I I I I II I I I I I I I I I I ! I I i I I I I I I I I I I I 
orf 75-1 EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from K 
gonorrhoeae; 

orf 75 . pep M FV FQT AFXM FQKH LQKAS D S WGGT L Y WAT P I GN LAD I T LRALAVLQKA AEDTR 56 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I i I I Mill 
orf75ng MSVFQTAFFMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEtJTR 60 
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116 



orf75 net) VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
||||MMIim:iMMMIIMMIh:|:llil:litmttltttlMtttltll 
orf75ng VTAQLLSAYGIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLAR 120 

orf75 pep RVREAGFKWPWGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 17 6 

tllllillllMm I M I I I t I I M tiiMfMlliilllimittlllltthi 
orf75ng RVREAGFKVVPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPW 180 

orf75 pep MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 236 

M I t t I 1 t 11 : t I I I I M 1 I II I I I i I I I I I I I I t I I I i I I I I I f M t I : II t : i I I I i I 
orf7 5ng mfETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 24 0 

orf75 pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 288 

| U | ! I | 1 I t I I M i I I 1 M 1 lllhlllllllllllllilMMiiilli 
orf75ng VLVLYPAQDEKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 

An ORF75ng nucleotide sequence <SEQ ID 291> was predicted to encode a protein having amino 
acid sequence <SEQ ID 292>: 

1 MSVFQTAFFM fqkhlqkasd SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE hnerqmadkv IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGF KVV PWGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPVV MFETPHRIGA TLADMAELFP 

201 ERRLMLARE I TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

After further analysis, the following gonococcal DNA sequence <SEQ ID 293> was identified: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

4 01 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 

This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l>: 



1 MFQKHLQKAS DSVVGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overlap: 



10 20 30 40 50 60 

orf 75-1 . pep M FQKH LQKASDSVVGGT L YVVAT P I GN LAD I T LRALAV LQKAD 1 1 C AE DT R VT AQLL SAY 
II I I I I I I I I i I I I I I I ! I I I M I I I I I I I M I I I I I I I I 1 I I I I I i I I I I I ! I I I I I I i 
orf75ng-l MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 75-1 . pep GIQGKLVSVREHMERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 
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10 



30 



35 



50 



t | | | : | | | | | I ! I | I I I I I t : : ! : I I M : 1 I I I M I I M I I I I M 1 1 I I i i 1 i i i i i 1 i I 
orf75nq-l GIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKV 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf75-l pep VPVVGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 
| ! 1 M I I I 1 1 1 I I i i I ! t I I I I I I 1 I I I I I I t M I I 1 I I I I 1 I I I I I : I I I I I I I I M 
orf75nq-l VPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIG 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 75-1 . pep ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
| M j I M | | | 1 1 I | I II I II I 1 I 1 11 1 I I 1 I I 1 II 1 I I I 1 : I i i M f I I I I I I 1 t f ! I I I 
orf75ng-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
15 190 200 210 220 230 240 

250 260 270 280 290 

orf 75-1. pep EKHEGLSES AQN I MK I LT AE L PT KQAAE LAAKI TGE GKKALY DLAL SWKNKX 
I I I I I I I i I I I I I 1 I I : I I I I I I I ! t I I I I I I I I I I I I I I 1 I I I ! I I M I I 
20 orf75ng-l EKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL SWKNKX 

250 260 270 280 290 

Furthermore, ORG75ng-l shows significant homology to a hypothetical Exoli protein: 

sp|P45528|YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

25 >gi | 606086 (U18997) ORF_f286 [Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli] Length = 286 
Score = 218 bits (550) , Expect = 3e-56 

Identities = 128/284 (45%), Positives - 171/284 (60%), Gaps = 4/284 (1%) 



Query: 


4 


Sbjct: 


2 


Query: 


64 


Sbjct: 


60 


Query: 


124 


Sbjct: 


120 


Query: 


184 


Sbjct: 


180 


Query: 


243 


Sbjct: 


239 



KHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ--GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 



RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WPf 



40 G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 



45 



D4- + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

SDIVAVLGESRYVVLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

rIEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
2E DL PADALRTLALLQAEL PLKKAAALAAE I HGVKKNALYKY AL 282 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 35 

55 The following partial DNA sequence was identified in N. meningitidis <SEQ ED 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GC . AAAGCAC CCGAAATCGA CCCGGCTTTG 

// 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 
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701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 
751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV IAAMLAGFAA XKAPEIDPAL 

// 

201 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GAT CAT GC AG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

7 51 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 



1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 31aa overlap with an 
ORF (ORF76a) from strain A of N. meningitidis: 



10 20 30 

orf7 6.pep MKQKKTAAAV I AAM LAG FAAXKA PE IDPAL 
I i I I i I I I I I I f I t II I 1 I I II I i I I I I I 
orf 7 6a MKQKKTAAAVIAAMLAGFAAAKA PEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
10 20 30 40 50 60 

// 

70 80 90 

orf 7 6 . pep XELVRNQLEQGLRQEKARLKI DALLEENGVKPX 

I I I 1 I I I I I I I I I I I II I I I I I : II I I I I 11 ! 
orf 76a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 

The complete length ORF76a nucleotide sequence <SEQ ID 299> is: 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGTC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 
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401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

5 601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 300>: 

10 1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

15 251 KP* 

ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 

10 20 30 40 50 60 

or f 7 6a . pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
I | I M I I I I II I I I I i I 1 i I I 1 I I I I I f I I I I I I I I I I t I I I I I I I i I I i ! I I I i I I I I I 
20 orf 7 6-1 MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 76a . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSESALRQF 
25 I I i It I I I I i I i 1 I I I I I ! I i i I II I 1 I I i I I I I ! I 1 I 1 I t I I I I I I I M I t I I : I : : I 

orf 7 6-1 AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 

70 80 90 100 110 120 

130 140 150 160 170 180 

30 orf 7 6a . pep YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

I I : I I I I I I i I I 1 I t I I I I I I t I II 1 I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I t 1 I I 
orf 7 6-1 YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
130 140 150 160 170 180 

35 190 200 210 220 230 240 

orf 7 6a . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
M I II I I I I I I I II I I I I I I ! I I I I I I I I I I I I I I I I f I I I M I I I f I I I I I 1 I I I I I I I 
orf 7 6-1 LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
190 200 210 220 230 240 

40 

250 

orf 7 6a . pep IDAILEENGVKPX 
I I I : I I I II I I II 
o r f 7 6- 1 I DALLEENGVKPX 

45 250 

Homology with a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from K gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 

50 orf 76. pep MKQKKTAAAV I AAM LAG FAAXKAPE I D PAL 30 

I II I I I I I I I I I I I I I I I I I I I I I I I II I 
orf 7 6ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 60 

// 

orf 7 6 . pep ELVRNQLEQGLRQEKARLKI DALLEENGVKP 251 

55 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

orf7 6ng VTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLKI DALLEENGVKP 251 

The complete length ORF76ng nucleotide sequence <SEQ ID 301> is: 

1 ATGAAACAGA AAAAGACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 
60 101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 
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151 AGACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AG GAT GT CCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 GTTCGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTcgc 

551 agtttgCCGG TATGAACCGT GGCGACGTTA CCCGCAATCC GGTCAAATTG 

601 GGCGAACGCT ATTACCTGTT CAAACTCGGC GCGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGGC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAaga Aaacggtgtc 

751 AaacCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 302>: 



1 MKOKKTAAAV IAAMLAGFAA AKAPEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR GDVTRNPVKL 

201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 



10 20 30 40 50 60 

orf 7 6- 1 . pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
1 I | | II I 1 I I t I I I I I It I t I I I t I t I I I II I I I I ! I M I I I I I M I I I I : 1 I ! I I I I M 
orf76ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 6-1 . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 
I M I I I I 1 I I 1 I H I I I I ! I II I I I I 1! I I I I I I I 1 1 I II I I I ! I 11 ! I i i M I : I : : I 
orf7 6ng AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSESALRQF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 6-1. pep YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
I I : I I I I I II I I I I I I I I I I I I I I I I I I I II II M I I I M I I I I I I I I I I I I I I I I I I I I 
orf7 6ng YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7 6-1 . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEECARLK 
I I I I I I : I 1 I I I I I I : II I! II I I I I i II : I I I I II I I I I I I I i I I I I I I I I I I I I I M 
orf7 6ng LASQFAGMNRGDVTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 
190 200 210 220 230 240 



250 

orf 7 6-1 . pep I DALLEENG VKPX 
I II I I I I I I I I I I 
or f 7 6ng I DALLEENGVKPX 

250 

Furthermore, ORF76ng shows significant homology to diB.subtilis export protein precursor: 

sp|P24327|PRSA_BACSU PROTEIN EXPORT PROTEIN PRSA PRECURSOR >gi I 98227 | pir | | S152 
33K lipoprotein - Bacillus subtilis >gi| 39782 (X57271) 33kDa lipoprotein 
[Bacillus subtilis] 

>gi|2226124|gnliPID|e325181 (Y14077) 33kDa lipoprotein [Bacillus subtilis] 
>gij 2633331 Ignl | PID|ell82997 (Z99109) molecular chaperonin [Bacillus subtilis] 
Length = 292 
Score =50.4 bits {118) , Expect = le-05 

Identities = 48/199 (24%), Positives = 82/199 (41%), Gaps = 32/199 (16%) 



Query: 70 VLKNRALKEGLDK DKDVQNRFKI AEAS F YAEEYVRFLERSETVSE 114 

VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 

Sbjct: 53 VLTQLVQEKVLDKKYKVSDKEIDNKLKEYKTQLGDQYTALEKQYGKDYLKEQVKYELLTQ 112 
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Query 115 SA LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 163 

A +++++E 1+ + A ++ A + ++ L KG FE L K Y 

Sbjct: 113 KAAKDNIKVTDADIKEYWEGLKGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 172 

^ Query 164 DEQAFDG FIMAQQLPEPLASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 218 

DA G F Q+E+ + G+V+ DPVK Y++ K +E D 

Sbjct: 173 DSSASKGGDLGWFAKEGQMDETFSKAAFKLKTGEVS-DPVKTQYGYHIIKKTEERGKYDD 231 

10 Query: 219 QPFELVRNQLEQGLRQEKA 237 

EL LEQ L A 
Sbjct: 232 MKKELKSEVLEQKLNDNAA 250 

Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
15 the gonococcal protein, it was predicted that the proteins from AT. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 10A shows 
the results of affinity purification of the His-fusion protein, Purified His-fusion protein was used 
20 to immunise mice, whose sera were used for Western blot (Figure 10B), ELISA (positive result), 
and FACS analysis (Figure 10C). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

Example 36 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 303>: 

25 1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTTACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

30 251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

35 1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

1401 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

1451 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

40 This corresponds to the amino acid sequence <SEQ ID 304; ORF81>: 

1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 

51 LFARNKVTRL LIAVFFAFSI IANNVHYADY QSWMT 

// 

401 ...QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 
45 451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 305>: 



1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 
51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 
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10 



15 



20 



25 



IS 30 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



CAAAAATTGC 
CTGTTTGCGC 
GTTCAGCATT 
TGACGGGCAT 
AGCGCGGGTG 
CGTGTTGGAA 
CGCATTTTTC 
GTGCGTTCGT 
ATACAGCCGC 
GCGTGTTGCC 
CAGCCTGCTC 
GATTATGGGC 
GACGCGAAAC 
CCGATTGTGA 
GCCCAGTTTT 
GCGGCGGCGA 
ACGTATTTTT 
AATCGGTAAG 
ACGGCAACGG 
AAAATCAATT 
TTCGCACGCC 
GCGAAGCCGA 
CAAATGATTC 
CTGGCTGTTT 
TCTACAATCA 
TTGTACAGCC 
GCCTTGCGAG 
CGTTGGGCTA 
GGCAACCTGA 
GGCGGAATAT 



GGAAACGTTT 
GTTATAAGGT 
ATTGCCAACA 
CAATTATTGG 
CGTCGATGTT 
GTCATGTTGT 
TGCCGATATA 
TCGACACGAA 
ATCAAAGCCA 
GTATCAGTTG 
CAAGCAAAAT 
GAAAGCGAAA 
TTCGCCGTTT 
AACAAAGTTA 
TTCAATGCGA 
TACCAATATG 
ACAGCGCGCA 
AAATGGATAG 
CGACAATATG 
TGCAGCAGGG 
CCATACGGCG 
TATTGTGGAT 
AAACCGTATT 
GCCTATACCT 
AGGCACGGTG 
CGGATAAGGC 
ATTGCCTTCC 
CGATATGCCG 
TTACGGGTGA 
GTTTATCCGC 



GCGCTGACAT 
GACGCGTTTG 
ATGTGCATTA 
CTGATGCTGA 
GGATAAGTTG 
TTTGCAGCCT 
CTGTTTGCCT 
ACAAGAGCAC 
ATTATTTCAG 
TTTGATTTAA 
CGGGCAGGGC 
GCGCGGCGCA 
TTAACCCGGC 
TTCCGCAGGC 
TACCGCACGC 
TTCCGCCTCG 
GGCGGAAAAC 
ACCATCTGAT 
CCCGATGAGA 
CAAGCATTTT 
CATTGTTGCA 
AAGTACGACA 
CGAGCAGCTG 
CCGATCATGG 
CAGCCCGACA 
CGTGCAACAG 
ATCAGCAGCT 
GTTTCAGGTT 
TGCAGGCAGC 
AATGA 



TTGTGATTGC 
TTGATTGCGG 
CGCGGTTTAT 
AAGAGGTTAC 
TGGCTGCCTG 
TGCCAAGTTC 
TCCTAATGCT 
GGTATTTCGC 
CTTCGGTTAT 
GCAGGATTCC 
AGTGTTCAAA 
TTTGAAGCTG 
TGTCGCAAGC 
TTTATGACTG 
CAACGGCTTG 
CCAAAGAGCA 
GAGATGGCGA 
TCAGC CGACG 
AGCTGCTGCC 
ATCGTGTTGC 
GCCTCAAGAT 
ACACCATCCA 
CAAAAGCAGC 
CCAGTATGTT 
GCTATCTCGT 
GCTGCCAACC 
TTCAACGTTC 
GTCGCGAAGG 
TTGAACATTC 



TGCGCTGTAT 
TGTTTTTTGC 
CAAAGCTGGA 
CGAAGTCGGC 
TGTTGTGGGG 
CGCCGTAAGA 
GATGATTTTC 
CCAAACCGAC 
TTTGTCGGAC 
CGCCTTTAAG 
ATATCGTCCT 
TTTGGCTACG 
CGATTTTAAG 
CAGTGTCCCT 
GAACAAATCA 
GGGCTATGAA 
TTTTGAACTT 
CAACTTGGCT 
GTTGTTCGAC 
ACCAACGCGG 
AAAGTATTCG 
CAAAACCGAC 
CTGACGGCAA 
CGCCAAGATA 
GCCGCTAGTG 
AGGCTTTTGC 
CTGATTCACA 
CTCGGTAACG 
GCGACGGCAA 



This corresponds to the amino acid sequence <SEQ ID 306; ORF81-l>: 



35 



,n 40 



45 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MKKSFLTLVL 
LFARYKVTRL 
SAGASMLDKL 
VRSFDTKQEH 
QPAPSKIGQG 
PIVKQSYSAG 
TYFYSAQAEN 
KINLQQGKHF 
QMIQTVFEQL 
LYSPDKAVQQ 
GNLITGDAGS 



YSSLLTASEI 
LIAVFFAFSI 
WLPVLWGVLE 
GISPKPTYSR 
SVQNIVLIMG 
FMTAVSLPSF 
EMAILNLIGK 
IVLHQRGSHA 
QKQPDGNWLF 
AANQAFAPCE 
LNIRDGKAEY 



AYRFVFGIET 
IANNVHYAVY 
VMLFCSLAKF 
IKANYFSFGY 
ESESAAHLKL 
FNAIPHANGL 
KWIDHLIQPT 
PYGALLQPQD 
AYTSDHGQYV 
IAFHQQLSTF 
VYPQ* 



LPAAKIAETF 
QSWMTGINYW 
RRKTHFSADI 
FVGRVLPYQL 
FGYGRETSPF 
EQISGGDTNM 
QLGYGNGDNM 
KVFGEADIVD 
RQDIYNQGTV 
LIHTLGYDMP 



ALTFVIAALY 
LMLKEVTEVG 
LFAFLMLMIF 
FDLSRIPAFK 
LTRLSQADFK 
FRLAKEQGYE 
PDEKLLPLFD 
KYDNTIHKTD 
QPDSYLVPLV 
VSGCREGSVT 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORF81a) from strain A of N. meningitidis: 



50 



55 



60 



orf 81 . pep 
orf81a 

orf 81 .pep 
orf81a 

orf 81 . pep 
orf 81a 



10 20 30 40 50 60 

MKKSFLTLVLYSSLLTAS EIAYPLELGIETLPAAK IAETFALTFVIAALYLF ARNKVTRL 
MM:::! M M M M I M II : : I I M I ! M t : I It t M I M I I I I M M I I : I I I 
MKKSLFVLFLYSSLLTAS E IAYRFVFG I ETL PAAKMAET FALT FV I AALYL FARYKATRL 
10 



20 



30 



40 



50 



60 



70 80 
LIAVFFAFSI I ANNVH YADY Q SWMT 
M M M M 1 i M I I I I M i II I : I 

LIAVFFAF5IIANNVH YAVYQSWITGINYWLMLKEITEVGGAGASMLDKLW LPALWGVLE 
70 80 90 100 110 120 

// 

120 130 140 

QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 
MINIM! M M I II II I M I I M I M I 
IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 
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280 290 300 310 320 330 

150 160 170 180 190 200 

IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 

|| | | || | M | i I I I I 1 t I I I I II I I I I I I i I ! I i i I I I I i I I I i I M t I I H I t I I I M I 
IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 

340 350 360 370 380 390 

210 220 230 

CREGS VTGNL I TGDAGS LN I RDGKAEYVY PQX 
I I I I I M II I I II I t I I I II ! I 1 I I I M I I i I 
CREGS VTGNLITGDAGSLNIRDGKAEYVY PQX 
400 410 420 

The complete length ORF81a nucleotide sequence <SEQ ID 307> is: 

1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCGTCCC TACTTACTGC 

51 CAGCGAAATT GCTTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC AGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC AACGCGTTTG TTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TAACGGGCAT TAATTATTGG CTGATGCTGA AAGAGATTAC CGAAGTTGGC 

301 GGCGCAGGGG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

4 51 GTGCGTTCGT TCGACACGAA ACAAGAACAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATTCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAGAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGCTACG 

701 GGCGCGAAAC TTCGCCGTTT TTGACCCAGC TTTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCATGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

901 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

951 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1001 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTGGTG 

1051 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1101 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1151 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1201 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1251 GGCGGAATAT GTTTATCCGC AATGA 

This encodes a protein having amino acid sequence <SEQ ID 308>: 

1 MKKSLFVLFL YSSLLTAS EI AYRFVFGIET LFAAK MAETF ALTFVIAALY 

51 LFARYKAT RL LIAVFFAFSI IANNVH YAVY QSWITGINYW LMLKEITEVG 

101 GAGASMLDKL W LPALWGVLE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSRIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTQLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDIVD KYDNTIHKTD 

301 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

351 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

4 01 GNL I TGDAGS LNIRDGKAEY VYPQ* 

ORF81a and ORF81-1 show 77.9% identity in 524 aa overlap: 

10 20 30 40 50 60 

orf 81a. pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 
I I i I : : : I I I I t I I II I I I I I I I I I I I I I I I I I I : I I I I I I I i M i I i I I 11 I I I : I I I 
orf81-l MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 81a. pep LIAVFFAFSI IANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 
I I I I I I I I I I I I I I I I I I I I I I I : 1 I I II I I I 1 I I : I I II : I I M I I I I I I M : I I I I M 
orf 81-1 LIAVFFAFSI IANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 



orf 81 .pep 
orf 81a 

orf 81 .pep 
orf81a 
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130 140 150 160 170 180 

orf81a pep VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

i j t t I I II t I t I t I t I II I ! I I I I I I I I I I I M HI I I I II I N I M I Ml I M I 1 I I I i 
orf 81-1 VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
5 130 140 150 160 170 180 

190 200 210 220 230 240 

orf81a pep FVGRVLPYQLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 
|IMI||||lllll:||:|IIMII:llli(:Mlf1!illltlllllMltllllllll 
10 orf 81-1 FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 

250 260 270 280 
or f 8 la pep LTQLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGD 

IS it:MiiiiiiiiiiiMiimmiimi:mmimMii 

orf81-l LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 
250 260 270 280 290 300 

20 orf81a.pep 

orf81-l TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
310 320 330 340 350 360 

25 290 300 310 320 

orf81a pep IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
orf 8 1-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
370 380 390 400 410 420 

30 

330 340 350 360 370 380 

orf 81a . pep AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
I M II I I I I I ! I I I I I I I I I I I I I I I I 1 ( I I M i I i I I i t I 1 i I I M I I I i I I i I I I i I I 
orf 81-1 AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
35 430 440 450 460 470 480 

390 400 410 420 

orf 8 la. pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
I M I I I I I I I I I I 1 I I II I I I I M I I I I I I I I I I I I II I I I I I I I 
40 orf 8 1-1 LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

490 500 510 520 

Homology with a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF8 1 and a predicted ORF (ORF8 1 .ng) from N. gonorrhoeae of the 
45 N- and C-termini show 82,4 % and 97.5% identity in 85 and 121 overlap, respectively: 

orf 81. pep MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 60 

I I I I : : : I I I I I I I I I I I I 1 I : : I I ! I I I I I I : I II I I I ! I : I I I I I I I I I I : : I I 
orfSlng MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 60 

50 orf 81. pep LIAVFFAFS I IANNVHYADYQSWMT 85 

111111(11:11111111 I I I I I I 
orf81ng LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 120 

// 

orf 81 . pep QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 433 

55 I I I I I I I I I I I I I I I I I I I I I I II I I I I I 

orf81ng ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 4 33 

orf 81 . pep IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 493 
I I I I 1 I I I I I I I : I I I ( I II I M I I I I I I I I I I I I I I I M I 1 I I i I I I I II I I I M I I I I 
60 orf81ng IYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 493 

orf 81 .pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

I I I 1 I I I I I II II I I I I I I I I : I I I I I I I I I 
orf81ng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 524 

65 The complete length ORF81ng nucleotide sequence <SEQ ID 309> is: 
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1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCATCCC TACTTACCGC 

51 CAGCGAAATC GCCTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC GGAAACGTTT GCGCTGACAT TTATGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC TTCGCGGCTG CTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATG ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGTAT TAACTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGCG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CTTTGTGGGG 

351 CGTGGCGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

4 01 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

4 51 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGGC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATCCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGTTACG 

701 GGCGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGTGCCCA GGCTGAAAAC CAAATGGCAA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAGGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTG CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATATTGT GCCTCTGGTT 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACA 

1501 GGCAACCTGA TTACGGGCGA TGCAGGCAGC TTGAACATTC GCAACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATAA 

This encodes a protein having amino acid sequence <SEQ ID 310>: 

1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFMIAALY 

51 LFARYKASRL LIAVFFAFSM IANNVH YAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL W LPALWGVAE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

4 01 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ* 

ORF81ng and ORF81-1 show 96.4% identity in 524 aa overlap: 

10 20 30 40 50 60 

MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 
I I I I : : : I I f I I I I I I i I f i I M I i i I I I i ! M f : i I I ! I I I I : I it I f I I II I I : : I I 
MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYECVTRL 
10 20 30 40 50 60 

70 80 90 100 110 120 

LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 
M I I I I I M : I II I I I I I I I I I I 1 I I I I I I II I I I I I I I I ! II I I I I I I I I II : I It I I 
LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 
70 80 90 100 110 120 

130 140 150 160 170 180 

VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
I I ! I I II I I I I i I I I I I I i I I I I I I I I I I II 1 I I I I I i I I I II I N I II I I I I I | I I I M 
VMLFCSLAKFRRKTHFSADILFAFLMLMI FVRS FDTKQEHGI S PKPTYSRIKANYFSFGY 
130 140 150 160 170 180 

190 200 210 220 230 240 

FVGRVLPYQLFDLSKIPVFKQPAPSKIGQGSIQN I VLIMGESESAAHLKL FGYGRETSPF 
M I I M M I I I I I I : I I I I I I I I I II I I I I : I I I I 1 I I I I I I I I I II I M I I I I M I I I 



orf 81ng-l *pep 
orf81-l 

orf 81ng-l . pep 
or£81-l 

orf 81ng-l .pep 
orf81-l 

orf 81ng-l . pep 
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orf 81-1 FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 

250 260 270 280 290 300 

5 orf81ng-l.pep LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGDTWMFRLAKEQGYE 

| | | | I I I I I I M I I I I I I M I I 1 I I M 1 ! I I I : 1 I I I II ! I I I M i M I I I I M I i M I I 
orf81-l LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 

10 310 320 330 340 350 360 

orf81ng-l.pep TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 
j | | | | i i | | ! : I I | I M I I I I I I I I M II i I I I I I I I I I (I I I I I i I I I I I M II M : I t 
orf81-l TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 

310 320 330 340 350 360 

15 

370 380 390 400 410 420 

orf 81ng-l , pep I VLHQRGS HAPYGALLQPQDKVFGEADI VDKY DNT I HKT DQMI QTVFE QLQKQP DGNWLF 
II I I I M II M I I I M I M M I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
20 370 380 390 400 410 420 

430 440 450 460 470 480 

orf 81ng-l .pep AYTSDHGQYVRQDIYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
I I I I II I I I I i j I I I i I M I I M I I : I I I I I I I I I I M I I I I i M I I I I I I I I I 1 I I I I I 
25 orf 81-1 AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 

490 500 510 520 

orf81ng-l.pep L I HTLG Y DMP VS GCREGS VTGN L I TGDAGS LN I RNGKAE YV Y PQX 
30 I I I I II I I I M I I I I I II II I II I I I I I I I I I I I : I I I I I I I I I I 

orf 81~1 LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

490 500 510 520 

Furthermore, ORF81ng shows significant homology to an E.coli OMP: 

gi | 1256380 (U50906) outer membrane adherence protein-associated protein [E. 
35 coli] Length = 547 

Score = 87,4 bits (213), Expect = 2e-16 

Identities = 122/468 (26%), Positives = 198/468 (42%), Gaps - 70/468 (14%) 

^uery: 25 VFGIETLPAAKMAETFA-LTFMIAALYLFARYKAS — RLLIAVFFAFSMIANNVHYAVYQ 81 
40 VFGI LA+A LF+++R + RLL+A F + A ++ ++Y 

VFGITNLVASSGAHMVQRLLFFVLTILVVKRISSLPLRLLVAAPFVL-LTAADMSISLY- 8 6 

SWMT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ P L A + L + 
45 Sbjct: 87 SWCTFGTTFNDGFAISVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVIIKYDV 141 



50 



55 



65 



Query: 


25 


Sbjct: 


29 


Query; 


82 


Sbjct: 


87 


Query: 


135 


Sbjct: 


142 


Query: 


184 


Sbjct: 


202 


Query: 


242 


Sbjct : 


258 


Query: 


299 


Sbjct: 


311 


Query: 


356 


Sbjct: 


360 



rLMLMIFVRSF DTKQEHGISPKPTYSRIKAN — YFSFGYFVG 183 

L+L++ S D K ++ SP SR +F+ YF 



+Q L + +P F+ + I VLI+GES ++ L+GY R T+P + 



+Q + Q+ S TA+S+P + +V+ H I N+ +A + G 

-AQRKQIKLFNQAISGAPYTALSVPLSLTADSVLSH DIHNYPDNI INMANQAG 310 



60 ++T++ S+Q+ +N A+ ++ ++ + Y G DE LLP + Q 

"FWLSSQSAFRQNGTAVTSI AMRAME T V YVRG F DELLLPHLSQALQQ 359 



Q + IVLH GSH P + VF D D YDN+IH TD ++ VFE L+ 



Query: 413 QPDGNWLFAYTSDHG QYVRQDIYNQG--TVQPDSYIVPL-VLYSP 4 54 

D Y +DHG ++++Y G +Y VP+ + YSP 
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Sbjct: 419 - - DRRAS VM Y FAD H GLER DPT KKNV Y FHG GRE AS Q Q A YH V PMFIWYSP 464 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 3 1 1>: 

1 . . .ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC.GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC . 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

401 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

4 51 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A. . . 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 

1 . . TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 

51 LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 

101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 

151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE VVPPXYADTD VFVTVDV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 313>: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

4 01 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 



1 MKTLLLLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

2 01 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 
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Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF83 shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

orf83 pep TLLLFIPLVLTX CGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 

Ml : | I 1 I 1 I i ! I I II I I I f II i II I i I I (( I I I I I I I I I I I I I I I f I I HI I I 
orf83a MKTLLXLIPLVLTA CGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
10 20 30 40 50 60 

60 70 80 90 100 110 

orf83 pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

I | | | 1 I I I I I I I I I I I I 1 I I I I I I II I I I I ! I I M I t I I I I I I I I I i I 1 I I I ! I I I I I I I 
orf83a YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 



120 130 140 150 160 170 

orf 83 . pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
I I II 1 I I I I I 1 I I I I I I I I I 1 I i I I II I I I I I I I I i I i I I I I I I I i f i i I M I II I I I I 
orf 83a TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 170 180 



180 190 
orf 83 . pep IEVVPPXYADTDVFVTVDV 
I I I I I I I II I I I II I I I I 
orf 83a IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 



1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 316>: 



1 MKTLLXLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 

ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 



10 20 30 40 50 60 

orf 83a . pep MKTLLXLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
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Mill i | J I I I i I I i I I I I t I t I i I I I I I f I I t I I I I I I I i I II I I 1 I t I I I I I M II I 
■f 8 3- 1 MKTLLLLI PLVLTACGTLTG I PAHGGGKRFAVEQELVAAS SRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf83a.pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I I I I I I I I I I I I 11 I 1 I I I I I ) I I I I I I I 1 I 1 I 1 I I I M I I I I I M I I I I II I I I I M M 
orf 83-1 YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 83a. pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
! I I I I I I I I I I I II M I I I i I I I I I I I I I I I I I II I I I I 11 I I I I I I ! I I I I I I I 1 I M I 
orf83-l TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 83a . pep IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
I I I I I I I I I I I I M II I ! I I I I I I I I I I II I I I I I M I I t I M I I I I I I I M I I I I I : I I 
orf 83-1 IEVVPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

190 200 210 220 230 240 



250 260 270 280 290 300 

TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
I I I I I I I I I I I I I I I II I : I : I II I i M I I i I I I I I I I I I I I I I I I I I I I I I I I I II I I 
TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
250 260 270 280 290 300 

310 

D VGN E V I RRRKGGX 
I I I I I II I I I II II 
DVGNE V I RRRKGGX 
310 

Homology with a predicted ORF from K gonorrhoeae 

ORF83 shows 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from N. 



orf 83a .pep 
orf83-l 

orf 83a. pep 
orf83-l 



gonorrhoeae: 



orf 83 . pep 


TLLLFIPLVLTXCGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 
1111:111111 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
MKTLLLLI PLVLTACGTLTGI PAHGGGKRFAVEQELVAAS SRAAVKEMDLSALKGRKAAL 


58 


orf 83ng 


60 


orf 83 .pep 


YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
! 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II I 1 1 1 1 1 1 1 : II 1 : 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 
YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 


118 


orf 83ng 


120 


orf 83 . pep 
orf 83ng 


TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
1 1 1 1 1 1 1 1 I 1 1 1 1 : 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 ! 1 II 1 1 1 1 1 M 1 II 1 1 1 1 1 1 1 | 1 
TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 


178 
180 


orf 83 -pep 


IEWPPXYADTDVFVTVDV 


197 


orf 83ng 


1 1 1 1 1 1 II II II 1 1 1 II 1 

IEVVPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 


240 



The complete length ORF83ng nucleotide sequence <SEQ ID 31 7> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAAACCC 
ACTGACCGGC 
AGGAACTCGT 
TCCGCCCTGA 
CCAAGGTTCG 
TACGCGGCGG 
CCCGCCTATG 
AACCACTTCC 
ACAACGGACG 
GGCGACTACC 
CCTGACCAAC 



TGCTCCTCCT 
ATACCCGCCC 
CGCCGCATCG 
AAGGACGCAA 
GGCAACATAA 
CTACCACAAC 
ACACTACCGC 
ACATCGCTTT 
CAAAGGCGAA 
GCAACGAAAC 
CTCATCCAAA 



CATCCCCCTC 
ACGGCGGCGG 
TCCCGCGCCG 
AGCCGCCCTT 
GCGGCGGACG 
AACCCCGACA 
CACCACCAAA 
TGAACGCCCC 
CGCTCCGCCG 
CCTGCTCGCC 
CCGTCTTCTA 



GTACTCACCG 
CAAACGCTTT 
CCGTCAAAGA 
TACGTCTCCG 
CTACTCCATC 
GCGCCACCCG 
TCCGACGCGC 
CGCCGCCGCC 
GACTGTCCGT 
AACCCCCGCG 
CCTGCGCGGC 



CCTGCGGCAC 
GCCGT CGAAC 
AATGGACTTG 
TTATGGGCGA 
GACGCACTGA 
ATACAGCTAC 
TCTCCGGCGT 
CTGACGAAAA 
CAACGGCACG 
ACGTTTCCTT 
ATCGAAGTCG 
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551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTCGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

7 51 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAACCCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 318>: 



1 MKTL LLLIFL VLTAC GTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 OYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

301 DVGNEVIRRR KGG* 

ORF83ng and ORF83-1 show 97.1% identity in 313 aa overlap 

10 20 30 40 50 60 

orf 83-1 . pep MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
I I f I I i M t I j I j t I I I I t I I ! I I I I I I I I I I I I I I II II I I I I I I I I I M I II I I I I I I 
orf83ng MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 83-1 . pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

I I I M I I II I I I M I I I i I I I I !) I I I 1 I I I I : I > I : M I I I I I I I I M I I I I II : I I I I 
orf83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 83-1 .pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

II M I II I I I I I I I : M I I I I I I I I I I I I I I I I I II II M I I I I I II i M M I I I II I I I 
orf83ng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 83-1 . pep IEVVPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 
I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I : I I 
orf83ng I E W P PE YADTDV FVT VDV FGT VRSRTE LHLYN AET LKAQTKLE Y FAVDRD SRKLL I APK 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 83-1 . pep TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSD IT P YGDTTAQNRP DFKQNNGKKP 
I II I M I I I II I I I I I I I : I : I I II I I II I I I I I I I I I I I I I I II i I I I I I 1 M I M : I 
orf83ng TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 

250 260 270 280 290 300 

310 

orf 83-1 . pep DVGNEVIRRRKGGX 
I I I I I I I M I I I I I 
o r f 8 3 ng DVGNEV I RRRKGGX 

310 

Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-underlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N, meningitidis and 
N. gonorrhoeae , and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



CHIR-0160 (356.001) 



-229- 



PATENT 



Example 38 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
319>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

401 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TAT CT AT AC A CTGGATAAAA 

551 AAGTTTATGA CTTGTAysrr TmmGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

7 01 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

7 51 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 

951 gaAAGAAGTG ACGGaGTTGA TGTGcgaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCGgCAAG TTGCCACATT GGGCGGAAAA 

1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ID 320; ORF84>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPXQN LMYDNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence <SEQ ID 32 1>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACATTGG GCGGAAAACC 

1101 GTAGCAGAAC CTAATGTACG ATAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 
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This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIHVLTQGP KLLDQNLRTL VRKHYHIASN 

5 151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIVLLIPVFV GL SYKML5SY GKKQEE PAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKP*QN LMYDNWEERG KPFEGIGGGV VGSAN* 

10 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A of TV". 
meningitidis: 

10 20 30 40 50 60 

15 orf84 pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 

I M M I 11 I I I I I I I t i I M I I I I M i II i I M :: I I I I I ( I I I I I I I I I I I I I I M I I I 
orf84a MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
10 20 30 40 50 60 

20 70 80 90 100 110 120 

orf 84 . pep L PKS T DE QL S AH DMYEW I KK PEN I GS I V I V DEAQDVW PARS AG SKI PEN VQW LNT HRHQG 
I ( I I I I I I I I I I I I ( I I ( I I I I I I I I I I I I I M I i I I I I I I I I I I I I I I I I I I I I I M I I 
orf 84a LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
70 80 90 100 110 120 

25 

130 140 150 160 170 180 

orf 84 . pep I DI FVLTQGPKLLDQNLRTLVRKHYHI ASNKMGMRTLLEWKICADDPVKMASSAFS S I YT 
I I I I I M II I I I I i I I I M I I I I I I I I I II I I ! I I I I I I I I I I I I I I I I I II M I I II I 
orf 84a IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
30 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 84 . pep LDKKVYDLYXXAEVHTVTSiKVKRSKW FYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 
I I | I II | M I I I I I I I I I I I I I I I I M I I I : I I I I I I I I I I M I I I I I I I I I I II I I I 
35 orf 8 4a LDKKVYDLYE SAEVHTVNKVKRSKW FYTLPVIILLIPVFVGL SYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 84 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 
40 111111:111: I I I I II I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I : 

orf 84a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 

310 320 330 340 350 360 

45 orf 84 . pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

M I I II I : I I I I I I I I I I I I : I II II :: I I I I II I I I I I I I I I I I I I : I I I I I I 
orf 84 a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFMPYKEESQGRDVQQSEQHHSDRPQV 
310 320 330 340 350 360 

50 370 380 390 

orf 84 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGVVGSANX 
I I I I I I I I I I 1 M ) I : II I I I I I I I I I I I ) I I I I I 
orf 84a ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 
370 380 390 

55 The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

60 201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 



CHIR-0160 (356.001) 



-231- 



PATENT 



251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATC AGGGC ATTGATATAT TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TAT CT AT AC A CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AATGGTTTTA TACTCTGCCA GTAATAATAT TGCTGATTCC 

651 CGTTTTTGTC GGCCTGTCCT ATAAAATGTT AAGTAGTTAT GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA TCAGGCAGTA 

751 TTTCAGGATA AAACAGAAGG CGAGCCGGTA AACAACGGTA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTGTA 

901 GAAGGCGGAA GAACCGGATG CACATGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAATT ACAAAGGAAA TGTGCAAGGA TTACGCAAGA AACGGATTGC 

1001 CGTTTAACCC ATATAAAGAA GAAAGCCAAG GGCGGGATGT CCAGCAAAGT 

1051 GAGCAGCACC ATTCGGACAG ACCGCAAGTT GCCACGTTGG GCGGAAAGCC 

1101 GTGGCAAAAT CTTATGTATG ATAATTGGCA GGAGCGCGGA AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 324>: 



1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGS KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIILLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEHQAV 

251 FQDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCV 

301 EGGRTGCTCY SHQGTALKEI TKEMCKDYAR NGLPFNPYKE ESQGRDVQQS 

351 EQHHSDRPQV ATLGGKPWQN LMYDNWQERG KPFEGIGGGV VGSAN* 

ORF84a and ORF84-1 show 95.2% identity in 395 aa overlap: 



10 20 30 40 50 60 

orf84a.pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRECVFTNIKGLKIPHTYIETDAKK 
I I I I i I M M I 1 I I I I I I I M I I II M I I I I M I I I I II I I M I I I I 11 I I I I 1 II I I I I 
orf 84-1 MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf8 4a.pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
I I I I I I I I I I I I II I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I 1 I I 
orf 84-1 LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf84a.pep IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I M I 
orf84-l IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 84a . pep LDKKVYDLYE SAEVHTVNKVKRSKWFYTLPVI I LLI PVFVGLS YKMLS S YGKKQEE PAAQ 
I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I : I I I I I II I I I I I I I I I I I I II I I M I I 
orf 84-1 LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 84a . pep ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 
111111:11): M I I I I I I I I I M I I I I I I I I M I I I I I I I I II I I I I I I I I I I I II ! : 
orf 84-1 ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 



310 320 . 330 340 350 360 

orf 84a. pep EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 
I I I I I I I : I I I I M I I I N : I : I! I I 1 : : I I I 1 I M M I II I I : : I I I I I : M I I II 
orf 84-1 EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 



370 380 390 
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orf 84a . pep AT LGGKPWQN LM Y DNWQERGKP FEG IGGG WG S ANX 
I I I I I I I I I M II I I : I I M I I I II I I I I I I I I M 
orf 84-1 ATLGGKPXQ1SJLMY DNWEERGKP FEG IGGG WGS ANX 

370 380 390 

Homology with a predicted ORF from N. gonorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from N. 
gonorrhoeae: 



OH04 . P"p 


MAE I CL I TG T PG S GKT LKMV SMMAN DEM FK P DE KA I RRKV FTN I KG LK I PH T Y I E T DAKK 
I I M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 ::: 11 1 1 II 1 1 1 1 ) 1 1 N 1 : 1 1 1 1 M 1 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 


60 


or f 8 4ng 


60 


orf 84 .pep 


LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

I | | | | 1 1 1 1 1 1 1 M 1 1 1 11 11 11 : 1 : 1 1 1 1 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 H M 1 H 1 1 1 M 
LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 


120 


orf 84ng 


120 


orf 84 .pep 


IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 


180 


orf 84ng 


I | | | | 1 1 1 ) 1 1 1 1 1 1 1 1 11 I 1 : : 1 1 1 II : II II : 1 M 1 1 1 1 : 1 1 1 1 11 1 1 M i 1 11 1 II 1 
I D I FVLTQG PKLLDQNLRTLVKRHYHI AANKMG LRTLLE WKVCADDPVKMAS SAFS SIYT 


180 


orf 84 . pep 


L D KK V Y DL YXX AE VH T VN KVKR SKWFYTLPVIVLLIPVFVGLSYKMLSS YGKKQE E P AAQ 
MINIMI i 1 : 1 ; 1 1 I 1 1 1 1 1 1 1 1 : II 1 1 : 1 1 1 1 :, 1 1 I 1 1 ! 1 1 : 1 1 I I 1 1 1 1 1 1 1 i 
LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 


240 


orf 84ng 


240 


orf 84 . pep 


ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 111 1 1 1 1 1 1 1 M 1 1 1 1 1 1 I 1 1 M 1 
ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 


300 


orf 84ng 


300 


orf 84 . pep 


EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
1 1 1 1 1 1 1 : 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 I 1 1 1 1 1 1 It II M I 1 M t M 1 1 1 
EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 


360 


orf 84ng 


360 


orf 84 . pep 


AT LGGK PX QN LM Y DNWE E RGK P FE G I GGG V VG S AN 395 
1 1 1 1 II 1 1 1 1 II 11 1 1 1 1 1 1 1 1 II 1 1 1 II I 1 1 M 
ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSAN 395 




orf 84ng 





The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 

1 ATGGCAGAAA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCAGATGAAA 

101 ACGGCGTACG CCGTAAAGTA TTTACGAACA TCAAAGGTTT GAAGATACCG 

151 CACACCCACA TAGAAACAGA CGCAAAGAAG CTGCCGAAAT CAACCGATGA 

201 ACAGCTTTCG GCGCATGATA TGTATGAATG GATCAAGAAG CCTGAAAacg 

251 tcggcgCAAT CGTTATTGTC GATGAGGCGC AAGACGTATG GCCCGCACGC 

301 TccgCAGGTT CGAAAATCCC CGAAAACGTC CAATGGCTGA ACACACACAG 

351 GCATCAGGGC AT AG AT AT AT TTGTATTGAC ACAAGGTCCT AAACTCTTAG 

401 ATCAGAACTT GCGAACATTG GTTAAAAGAC AT T AC C AC AT TGCGGCCAAC 

4 51 AAAATGGGTT TGCGTACCCT GCTTGAATGG AAAGTATGCG CGGATGACCC 

501 GGTAAAAATG GCATCAAGTG CATTTTCCAG TATCTACACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCCGCAGAAA TTCACACGGT AAACAAAGTC 

601 AAGCGTTCAA AATGGTTTTA TGCATTGCCC GT CAT CAT AT TATTGATTCC 

651 GCTATTTGTC GGTTTGTCTT ACAAAATGTT GGGCAGTTAC GGAAAAAAAC 

7 01 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG AGAATCGGTG AATAACGGAA ACCTTACGGC 

801 AGATATGTTT GTTCCGACAT TGCCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGGACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CACCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACCTTGG GCGGAAAACC 

1101 GCAGCAGAAC CTAATGTACG ACAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAAT CGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 326>: 

1 MAEICLITGT PGSGKTLKMV SMMAN DEM FK PDENGVRRKV FTNIKGLKIP 
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51 HTHIETDAKK LPKSTDEQLS AH DM YEW IKK PENVGAIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VKRHYHIAAN 

151 KMGLRTLLEW KVCADDPVKM ASSAFSSIYT LDKKVYDLYE SAEIHTVNKV 

201 KR5KW FYALP VIILLIPLFV GL SYKMLGSY GKKQEEPAAQ ESAATEQQAV 

5 251 LPDKTEGESV NNGNLTADMF VPTLPEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCTCY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPQQN LMYDNWEERG KPFEGIGGGV VGSAN* 

ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 

10 20 30 40 50 60 

10 orf84-l pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

| I i M I I I I t M I i i M I I I I I I i 1 ( I I i I I I t I I : I ! I I 1 I M M I I I I I I : I I t I I 1 I 
orf84ng MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 
10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 84-1. pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
I I I I I I I I I I I I I I I ( I I I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
orf84ng LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
70 80 90 100 110 120 

20 

130 140 150 160 170 180 

y orf 84-1 . pep I D I FVLTQG PKLLDQN LRT LVRKH YH I ASNKMGMRTLLEWK I CADD PVKMAS S AF S S I YT 

*fl I I I I I I I I I M I I M I I II I I :: I I I M : I I I I : I I I I I I I : M I I I I I ! I 11 I II i I I I 

s.= orf 84ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 

;J 25 130 140 150 160 170 180 

1*1 190 200 210 220 230 240 

I r orf 84-1 , pep LDKKVYDLYE SAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

!" I t I I I I I i I I M I : I I I I I M I I I I M : I I M : II I I : I I II III II : I I I I I I I I I I I I 

?™ 30 orf84ng LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 

IB 190 200 210 220 230 240 

1- 250 260 270 280 290 300 

^3 orf 84-1 .pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 

^ 35 I II ) II I ! II I I I I II I i I I I M I I I I I I I I II I I I II I I I I I I I II I I I II I I I II I 

= s s orf84ng ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

!2 250 260 270 280 290 300 

kU 310 320 330 340 350 360 

^ 40 orf 84-1 .pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

I I 1 I I I I : M I I I 1 I M I I II I I I I I I M I I I II 11 I I 1 I I I I I I II I I I ( I I I I I i I I I 
orf84ng EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
310 320 330 340 350 360 

45 370 380 390 

orf 84-1 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
I I II 1 1 I I I I I I I I I II M I I I I I I I II I I I I M I 
orf84ng ATLGGKPQQNLMYDNWEERGKPFEGIGGGVVGSANX 
370 380 390 

50 Based on this analysis, includng the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
double-underlined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 39 

55 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 327>: 

1 GTGGTTTTCC TGAATGCCGA CAACGGGATA TTGGTTCAGG ACTTGCCTTT 
51 TGAAGTCAAA CTGAAAAAAT TCCATATCGA TTTTTACAAT ACGGGTATGC 
101 CGCGTGATTT CGCCAGCGAT ATTGAAGTGA CGGACAAGGC AACCGGTGAG 
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151 AAACTCGAGC GCACCATCCG CGTGAACCAT CCTTTGACCT TGCACGGCAT 

201 CACGATTTAT CAGGCGAGTT TTGCCGACGG CGGTTCGGAT TTGACATTCA 

251 AGGCGTGGAA TTTGGGTGAT GCTTCGCGCG AGCCTGTCGT GTTGAAGGCA 

301 ACATCCATAC ACCAGTTTCC GTTGGAAATT GGCAAACACA AATATCGTCT 

351 TGAGTTCGAT CAGTTCACTT CTATGAATGT GGAGGACATG AGCGAGGGCG 

401 CGGAACGGGA AAAAAGCCTG AAATCCACGC TGCCCGATGT CCGCGCCGTT 

451 ACTCAGGAAG GTCACAAATA CACCAAT TACCG 

501 TATCCGTGAT GCGCCAGGCC AGGCGGTCGA ATATAAAAAC TATATGCTGC 

551 CGGTTTTGCA GGAACAGGAT TATTTTTGGA TTACCGGCAC GCGCAGCGC . 

601 TTGCAGCAGC AATACCGCTG GCTGCGTATC CCCTTGGACA AGCAGTTGAA 

651 AGCGGACACC TTTATGGCAT TGCGTGAGTT TTTGAAAGAT GGGGAAGGGC 

701 GCAAACGTCT . GTTGCCGAC GCAACCAAAG GCGCACCTGC CGAAATCCGC 

7 51 GAACAATTCA TGCTGGCTGC GGAAAACACG CTGAACATCT TTGCACAAAA 

801 AGGCTATTTG GGATTGGACG AATTTATTAC GTCCAATATC CCGAAAGAGC 

851 AGCAGGATAA GATGCAGGGC TATTTCTACG AAATGCTTTA CGGCGTGATG 

901 AACGCTGCTT TGGATGAAAC CAT.ACCCGG TACGGCTTGC CCGAATGGCA 

951 GCAGGATGAA GCGCGGAATC GTTTCCTGCT GCACAGTATG GATGCGTACA 

1001 CGGGTTTGAC CGAATATCCC GCGCCTATGC TGCTGCAACT TGATGGGTTT 

1051 TCCGAGGTGC GTTCGTCGGG TTTGCAGATG ACCCGTTCCC C.GGTCCGCT 

1101 TTTGGTCTAT CTC... 

This corresponds to the amino acid sequence <SEQ ID 328; ORF88>: 

1 MVFLNADNGI LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLGD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLPDVRAV 

151 TQEGHKYTNX XXXXXYRIRD APGQAVEYKN YMLPVLQEQD YFWITGTRSX 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRXVAD ATKGAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKEQQDKMQG YFYEMLYGVM 

301 NAALDETXTR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

351 SEVRSSGLQM TRSXGPLLVY L. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 329>: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCAGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

4 01 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 

4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGTGATT TCGCCAGCGA TATTGAAGTG ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

14 51 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

17 51 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 
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1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 



1 MSKSRRSPPL LSRPWFAFFS SMRFA VALLS LLGIA5VIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

4 01 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

4 51 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF88 shows 95.7% identity over a 371aa overlap with an ORF (ORF88a) from strain A of N. 
meningitidis: 



10 20 30 

orf 88 . pep MVFLNADNG I LVQDLPFE VKLKKFHI DFYN 

: t I I I I I I I I I I I I I I I I ! I I I M M I I I I 
orf 88a AKDFKPESILGASNLSFRGNVNISEGQSADVVFLNADNGILVQDLPFEVKLKKFHIDFYN 
210 220 230 240 250 260 



40 50 60 70 80 90 

orf 88 . pep TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

I I 1 i I I I I i II I I t I I i I i II i i t I I I I I I II II I I I I I I ( I ( I I I I I II t I M I i I i M 
orf 8 8a TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

270 280 290 300 310 320 



100 110 120 130 140 150 

orf 88 .pep ASREPVVLECATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 

I I I I I I I 1 I I I I I I I I I I I I I ) II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
orf 88a ASREPVVLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLNDVRAV 

330 340 350 360 370 380 



160 170 180 190 200 210 

orf 88 . pep TQEGHKYTNXXXXXXYRIRDAPGQAVEYBCNYMLPVLQEQDYFWITGTRSXLQQQYRWLRI 

1111:1111 I I I I I I I I I I I I I I M I I I I II I I I I I I II I I I II I I I I I I I I 

orf 88a TQE GKK Y T N I G P S I V YR I R D AAGQAVE YKN YML PVLQEQDYFWITGTRS G LQQQ YRW L R I 

390 400 410 420 430 440 



220 230 240 250 260 270 

orf 88 . pep PLDKQLKADTFMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 

I I II 1 II I 11 I I I t II I I I I I I I I I I I I I I M I I I I I I I I I II I I I 1 M II I I 1 I I I I I 
orf 88a PLDKQLKADTFMALRE FLKDGEGRKRLVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 

450 460 470 480 490 500 



280 290 300 310 320 330 

orf 88 .pep GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 

I I II I I I I I II I I I I II 1 I I I I II I I II I II I I I I I I I I ! I I I I I I I I I I I I || I | M 
orf 88a GLDE FI T SN I PKEQQDKMQG Y FYEMLYGVMNAALDET I RR YGLPE WQQDEARNR FL LHSM 

510 520 530 540 550 560 



orf 88 .pep 



340 350 360 370 

DAYTGLTEYPAPMLLQLDGF5EVRSSGLQMTRSXGPLLVYL 
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I I I I I 1 I I I I I I I I I I I I I I I I I I II I I f I t I I t Milt 
or f 8 8a DAYTGLTEYPAPMLLOLDGFSEVRS SGLQMTRS PGA LLVYLGSVLLVLGTVLM FYVREKR 

570 580 590 600 610 620 

orf88a AWVLFSDGKIRFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 
630 640 650 660 670 

The complete length ORF88a nucleotide sequence <SEQ ID 331 > is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



ATGAGTAAAT 
TTTTTTCAGC 
TTGCATCGGT 
TATTTGGTCA 
ACTGTATGAC 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AGAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGC 
TCCGCGCCGT 
ATTGTTTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCACAAA 
CCCGAAAGAG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCGTAC 
TTGATGGGTT 
CCGGGTGCGC 
GGTATTGATG 
ACGGCAAAAT 
CAGAAGGAAT 
CTTGAATCAT 



CCCGTAGATC 
TCCATGCGCT 
TATCGGTACG 
AATTCGGATC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTTCG 
TCTTCGCTGT 
GGAAGTACAA 
TTCTGATTGC 
GCCCATGTTG 
CCTGCTGTTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGCGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACATCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GTATCCGTGA 
CCGGTTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGTTTGA 
TTCCGAGGTG 
TTTTGGTCTA 
TTTTATGTGC 
CCGTTTTGCC 
TTCCAAAACA 
GACTGA 



TCCCCCACTT 
TTGCGGTCGC 
GTGTTGCAGC 
GTTTTGGGCG 
CGGCATGGTT 
TGCCTGATTC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
CTTTGATTGT 
AAACTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TTGCCAGTGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGGGTGA 
CACCAGTTTC 
TCAGTTTACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGCGGCAGGG 
AGGAACAGGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAT 
CCGAATATCC 
CGTTCGTCGG 
TCTCGGCTCG 
GCGAAAAACG 
ATGTCTTCGG 
CGTCGAGAGT 



CTTTCCCGTC 
TTTGCTCAGT 
AAAACCAGCC 
CAGATTTTTG 
TGTCGTTATC 
GCAATGTGCC 
AAAGAAAAAT 
AATTGCGCCC 
GAAAAACCAT 
GGCACAATGA 
CATTTGCCTG 
TGCTGACCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTA 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACG 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
GTGCTGTTGG 
GGCGTGGGTA 
CCCGCAGCGA 
CTGCAACGGC 



CGTGGTTCGC 
CTGCTGGGTA 
GCAGACGGAT 
GTTTTCTGGG 
ATGATGTTTT 
GCCGTTCTGG 
CTCTGGCGGC 
GAGGTTGCCA 
TAACCGTGAA 
ACAAATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
ATTGGTTCAG 
ATTTTTACAA 
ACGGATAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAGCCTGTCG 
TGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
ATTACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GGCGCACCTG 
GCTGAACATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTGCAAC 
GACCCGTTCC 
TATTGGGTAC 
TTGTTTTCAG 
ACGGGATTTG 
TCGGCAAGGA 



This encodes a protein having amino acid sequence <SEQ ID 332>: 



1 MSKSRRSPPL LSRPWFAFFS SMRFA VALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

4 01 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

4 51 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 



ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 



CHIR-0160 (356.001) 



-237- 



PATENT 



orf88a pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

| I ) M I 1 I I 1 I i I I 1 I ( t I i t i I t I I I I t I i I I I I I ( I M t I I I I I t I I I I I I I I I I I I I 

orf88-l MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

5 orf88a pep QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

| t I I I I I II t I I I I I I I I I M I M I I I I I I I II I I I ! I I I ! I I I I I I I I I 1 ! I I 1 M ! I I 

orf 88-1 QIFGFLGLYDVYASAWFVVII^FLWSTSLCLIRNVPPFWREMKSFREKVKEKSIAAMRH 120 

orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 
10 l || I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I M I I I I I I I ( I I I II I I U I I I i I I I I I 

orf 88-1 S S LLDVK I APE VAKRYLEVQGFQGKT INRE DG S VL I AAKKGTMNKWG Y I FAH VAL I V IC L 180 

orf88a pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

1 | M | 1 | | | | I i I I I I I t M I I II I I I I I I I I I i I I I I I II I I I I I I ! i M I I I II I H 1 

15 orf 88-1 GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADVVF 240 



20 



40 



orf88a pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

(MlllllilllllMIIIIIIIMMIIIIIMIIIIirillMIIIIIIIIIMIIII 
orf 88-1 LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf 88a. pep LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I 1 I I I I I M I I I I I I II I I I I i I I I I I 
orf 88-1 LHGITIYQASFADGGSDLT FKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 



25 orf 88a. pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

I | I | || I I I i I I i I I I I I I I I I I I II i I I I II I I I I t I t I I I t I I I I I I I I I I I I i I I ! I 
orf 88-1 SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf 88a .pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 
30 I I I I I 1 I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I 1 I I I I I I I I I I 1 I I II I I I I I I 

orf 8 8-1 PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 

orf 88a. pep GAPAE I REQFMLAAENT LN I FAQKG YLGLDE FI T SN I PKEQQDKMQGY F YEML YGVMN AA 540 
I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 1 I I I I I I I I I I II I I I I I I i I I I I I I 
35 orf 88-1 GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

orf 88a. pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

I I I I I II I I M I I I I I I t f I I I I I M I I I I I I I I I I I I I I I I I I I I M I I I I M I I I M I 
orf 88-1 LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 



orf 88a . pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
orf 88-1 PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 



45 orf 8 8a. pep LQRLGKDLNHD 672 

I I I I I I I I I I I 
orf88-l LQRLGKDLNHD 672 

Homology with a predicted ORF from TV. gonorrhoeae 
50 ORF88 shows 93.8% identity over a 371aa overlap with a predicted ORF (ORF88.ng) from K 
gonorrhoeae: 



55 



orf 88 .pep MVFLNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 

I I I I i II I I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I 

orf88ng MVFLNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 

orf 88. pep PLTLHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf88ng PLTLHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 



60 orf 88 . pep QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 180 

I 11 I I I I I I I I I I I I I I I I I I I 1 I M I 1 I I 1 I I : I I I I I I I I I I I I I I I I I I 

orf88ng QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 180 

orf 88. pep YMLPVLQEQDYFWITGTRSXLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRXVAD 240 
65 I I I I : I I :: I I I I : I I I I I I I I I I I I I I I I I I I I I I II I I I I i II I I I I I I | | I I Ml 

orf88ng YMLPILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVAD 240 
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ATKGAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVM 
| M M II I I 1 I 1 I I I 1 i M ( i ( I I t I i I 1 I I I I t ( I M M I M I I I I I M I I I M 



300 



orf88 .pep 
orf88ng 
orf 88 .pep 
orf88ng 
orf88 .pep 
orf 88ng 

An ORF88ng nucleotide sequence <SEQ ID 333> was predicted to encode a protein having amino 



ATKDAPAE I REQFML^N™ 

NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 

11)1111 | | | I I II I 1 I II I II I M I I I M I I I I I II I I M I I I I I M I M I I I I I I I 
NAALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 

TRSXGPLLVYL 
III I Mill 

TRSPGALLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 



300 
360 
360 
371 
420 



acid sequence <SEQ ID 334>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 



MVFLNADNGM 
KLERTIRVNH 
TSIHQFPLEI 
TQEGKKYTNI 
LQQQYRWLRI 
EQFMLAAENT 
NAALDETIRR 
SEVRSSGLQM 
RFAMSSARSE 



LVQDLPFEVK 
PLTLHGITIY 
GKHKYRLEFD 
GPSIVYRIRD 
PLDKQLKADT 
LNIFAQKGYL 
YGLPEWQQDE 
TRSPGALLVY 



LKKFHIDFYN 
QASFADGGSD 
QFTSMNVEDM 
AAGQAVEYKN 
FMALREFLKD 
GLDEFITSNI 
ARNRFLLHSM 
LGSVLLVLGT 



TGMPRDFASD 
LTFKAWNLRD 
SEGAEREKSL 
YMLPILQDKD 
GEGRKRLVAD 
PKGQQDKMQG 
DAYTGLTEYP 
VFMFYVPKKR 



IEVTDKATGE 
ASREPWLKA 
KSTLNDVRAV 
YFWLTGTRSG 
ATKDAPAEIR 
YFYEMLYGVM 
APMLLQLDGF 
AWVLFSNXKI 



RDLQKEFPKH VESLQRLGKD LNHD* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 



ATGAGTAAAT 
TTTTTTCAGC 
TTGCATCGGT 
TATTTGGTCA 
TTTGTATGAT 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AAAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGT 
TCCGCGCCGT 
ATCGTGTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCGCAAA 
CCCGAAAGGG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCCTAT 
TTGACGGGTT 
CCGGGTGCGC 
ggtaTttatg 
aCGGCAAAAT 
cAGAaggaaT 



CCCGTATATC 
TCCATGCGCT 
TATCGGCACG 
AATTCGGACC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTCCG 
TCTTCGCTGT 
GGAGGTGCGG 
TTCTGATTGC 
GCccaagtag 
CCTGCTGCTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGCGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACCTCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GCATCCGTGA 
CCGATTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGGCTGA 
TTCCGAGGTG 
TTTTGGTCTA 
tTTTATGTGC 
CCGTTTTGCT 
TTCCAAAACA 



TCCCACACTT 
TTGCGGTCGC 
GTGTTACAGC 
GTTTTGGACT 
CGGCATGGTT 
TGTTTAATCC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
CtTTGATTGT 
AAGCTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TTGCCAGCGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGAGGGA 
CACCAGTTTC 
TCAGTTCACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGcggCAGGG 
AGGACAAAGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATG AAA 
AGCGCGGAAC 
CGGAATATCC 
CGTTCCTCAG 
TCtcggctcg 
GCGAAAAACG 
ATGtCTTcgg 
CGtcgAGAGC 



CTTTCCCGTC 
TTTGCTCAGT 
AAAACCAGCC 
CGGATTTTTG 
TGTCGTTATC 
GTAACGTTCC 
AAAGAAAAAT 
AATTGCCCCC 
GAAAAACCGT 
GGCAcaatga 
CATTTGCCTG 
TGCTGGCCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTA 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACT 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
gtattgttgg 
GGCGTGGgta 
CCcgcagcga 
CTGCAACggc 



CGTGGTTCGC 
CTGCTGGGTA 
GCAGACGGAT 
ATTTTTTGGG 
ATGATGTTTC 
GCCGTTTTGG 
CTCTGGCGGC 
GAAGTTGCCA 
CAGCCGTGAG 
acaaATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
GTTGGTTCAG 
ATTTTTACAA 
ACGGACAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAACCTGTCG 
CGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
CTGACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GACGCACCTG 
GCTGAATATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTCCAGC 
GACCCGTTCG 
TTTTGGgtac 
tTGTTTTCag 
ACGGGATTTG 
tcggcaaggA 
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2001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence <SEQ ID 336; ORF88ng-l>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



MSKSRISPTL 
YLVKFGPFWT 
REMKS FREKV 
DGSVLIAAKK 
PDNQAVYAKD 
DLPFEVKLKK 
LHGITIYQAS 
KYRLEFDQFT 
IVYRIRDAAG 
KQLKADTFMA 
FAQKGYLGLD 
PEWQQDEARN 
PGALLVYLGS 



LSRPWFAFFS 
RIFDFLGLYD 
KEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKG 
RFLLHSMDAY 
VLLVLGTVFM 



SMRFAVALLS LLGIASVIGT 



VYASAWFWI MMFLWSTSL 



SSLLDVKIAP 
AQVALIVICL 



EVAKRYLEVR 
GGLIDSNLLL 



QKEFPKHVES LQRLGKDLNH 



NLSFRGNVNI 
PRDFASDIEV 
KAWNLRDASR 
AEREKSLKST 
PILQDKDYFW 
RKRLVADATK 
QQDKMQGYFY 
TGLTEYPAPM 
FYVREKRAWV 
D* 



SEGQSADVVF 
TDKATGEKLE 
EPVVLKATSI 
LNDVRAVTQE 
LTGTRSGLQQ 
DAPAEIREQF 
EMLYGVMNAA 
LLQLDGFSEV 
LFSDGKIRFA 



VLQQNQPQTD 
CLI RNVPPFW 
GFQGKTVSRE 
KLGMLAGRIV 
XNADNGMLVQ 
RTIRVNHPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRIPLD 
MLAAENTLNI 
LDETIRRYGL 
RSSGLQMTRS 
MSSARSERDL 



ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



60 



orf 88-1. pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 . pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 -pep 
orf 88ng-l 
orf 88-1 .pep 
orf88ng-l 
orf 88-1. pep 
orf88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 . pep 
orf 88ng-l 
orf 88-1 . pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 



MSKSRRS PPLLSRPWFAFFS SMRFAVALLSLLGIASVI GTVLQQNQPQT D YLVKFGS FWA 60 

(INI i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II I I I I I 1 I I I M: 
MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 



QI FGFLGLYDVYAS AWFWIMMFLWSTSLCLIRNVPPFWREMKS FREKVKEKS LAAMRH 
: | I I M I ! I M M I I I I I I I I I I I I M I II M II I I I I I I M I I I I I I I I I I I I I I I I I 
RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 



120 



120 



S S LLDVK I APE VAKRY LE VQG FQGKT INRE DG S VL I AAKKGTMNKWG Y I FAH VAL I V I CL 180 
( I M M I I I I I I I I I I I I I : I I I I I I : : I I I I I I I I I I II I I I I I I I I I I I : I I II I I I I 
SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICL 180 

GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPES I LGASNLSFRGNVN I SEGQSADVVF 240 
M I I I I I I I M I I I I : I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I 
GGLIDSNLLLKLGMLAGRIVPDNQAVYAKDFKPES I LGASNLSFRGNVNI SEGQSADVVF 240 

LN ADNGILVQDLPFEVKLKKFHI DFYNTGMPRD FAS D IEVTDKATGEKLERT IRVNHPLT 300 
I I I I I I : I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II M I I 11 M 
LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERT IRVNHPLT 300 

LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I M I I I I I I I I I 
LHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 4 20 

I M M I I I I I I I I I I I 1 I I I I I 11 I I I I I I I II I I I I M I M II I M II I II I I I I I I I I 
SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 4 20 

PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

I : I I : : M M : I I I I I I II I 1 I I I II I I I I I I M 1 I I 11 I 11 I I i I I II I I II 1 I I I I I I 
PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 

GAPAE I REQFMLAAENTLN I FAQKG YLGLDE FI TSNI PKEQQDKMQGYFYEML YGVMNAA 540 

1 I II I I I I I I I I 1 II I I I I 11 I I I I I I 11 I I 11 I II I 1 I M I 1 I II M I II I II I M I 
DAPAE I REQFMLAAENTLN I FAQKG YLGLDE FITSNI PKGQQDKMQG Y FYEMLYGVMNAA 54 0 

LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 
I I 1 I I M I M I I I I I I I M I I I M I I I I I I I I I I II II I I I I I I I I I I I M I II I I I I I I 
LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

PGALLVYLGS VLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 
I I I I I I I I I I M I I I I I I : I I I I I I I II I I I I I I II I I I 1 II I I I I I I I I I I I I II II I I 
PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 



orf 88-1. pep LQRLGKDLNHD 671 

I I I I II I I I I I 
orf88ng-l LQRLGKDLNHD 671 
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Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 

gi | 2984296 (AE000771) hypothetical protein [Aquifex aeolicus] Length = 537 
Score =94.4 bits (231), Expect - 2e-18 

Identities « 91/334 (27%), Positives = 159/334 (47%), Gaps - 59/334 (17%) 

FAFFSSMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 7 4 
+ F +s ++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 



++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 



++L + g F+ v E + + A+KG ++ G +AL+VI G LID 
LKFLLKKGFK-VFVEEEGNKLYVFAEKGRFSRLGVYITHIALLVIMAGALID 24 9 



+I+G RG++ ++EG + DV+ + A+ L 

-AIVGV RGSLIVAEGDTNDVMLVGAE— QKPYKL 280 



Query: 


16 


Sbjct : 


80 


Query: 


75 


Sbjct: 


140 


Query: 


135 


Sbjct: 


198 


Query: 


193 


Sbjct: 


250 


Query: 


253 


Sbjct: 


281 


Query: 


301 


Sbjct: 


338 



PFEVKLKKFHIDFY NTGMPRDFA SDIEVTDKATGEKLER — TIRVNHPLT 300 

PFVLFIY N++FA SDIE+ + G K+E T++VN P 



++QA++ DG S + + + A +P 

^YRLFQATYGILDGTSGMGVIWDRKKAHEDP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N.meningitidis and K gonorrhoeae \ and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 



The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
337>: 



1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG AC CATC GAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 338; ORF89>: 



1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 339>: 

1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 
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201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 340; ORF89-l>: 

1 MMS NKMEQKG FTLIEMMIW AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with PilE of TV. gonorrhoeae (accession number Z69260). 
ORF89 and PilE protein show 30% aa identity in 120a overlap: 

orf8 9 8 QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 

QKGFTLI MIV+AI+GI++ +A+P+Y Y + S+ G + ++L++ 

PilE 5 QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 

orf89 67 -DDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGYTLSVW 125 

DN + +G + KI KY SV + GV K G LS+W 

PilE 65 PKDNTS AGVASSDKIKGKYVQSVTVAKGWTAEMASTGVNKEIQGKKLSLW 115 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF89 shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) from strain A oiN. 
meningitidis: 

10 20 30 40 50 60 

orf89 pep MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 

I t | ! | | N ) ! I I I I it III I 1 i I I I I I I ( I 1 t I I t I 1 I I I I M I 
orf89a MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf89.pep I LKN PLDDNQT IENKLE I FVSGYKMNPKI AKKYS VS VKFVDKEKSRAYRLVGVPKAGTGY 
|||||||||tM::IIMIMMIIIIlll:ll:lll:l|::tl III 1)1111:1111 
orf8 9a ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

70 80 90 100 110 120 

130 140 150 160 

orf 89 . pep T LS VWMN SVGDGYKCRDAAS AQAHLETLSS DVGCEAFSNRKKX 

II I I I II I I I I I t I! I ! I I I I : t M M I 1 I It 1 I it II II I I I 
orf 89a TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 

130 140 150 160 

The complete length ORF89a nucleotide sequence <SEQ ID 341 > is: 

1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGNGANGNT 

51 NATNGNCNTC GCGATACNCN GCNTTANCAG CGTCATTNCN ATNNNTNCNT 

101 ATCNNAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTNT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCAAGA GCAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC GAAAAATATA ATGTTTCGGT GCATTTTGTC 

301 AATGAGGAAA AACCNAGGGC ATACAGCTTG GTCGGCGTTC CAAAGACGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCGAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ID 342>: 



1 MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 
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51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 
101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 
151 DVGCEAFSNR KK* 

ORF89a and ORF89-1 show 83.3% identity in 162 aa overlap: 

10 20 30 40 50 60 

orf 8 9a pep MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

I I I I I I ! I I I 1 M I I I 111 I I i I M 1 I I I I I 1 M i i I t I I I I I I 

orf 89-1 MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 89a. pep ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 
I I M I I I I! I II : : I II I III I I I I I M ! I : M : I I I : M : : I I Ml I I I I I I = I i i 1 
orf 89-1 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
15 70 80 90 100 110 120 

130 140 150 160 

orf 89a . pep TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 
I I I I I 1 I M 1 II I I I i I I I I I : I M I I I I I i I I i I I I 1 I I I I I 
20 orf 89-1 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

130 140 150 160 

=0 Homology with a predicted ORF from N. gonorrhoeae 

S ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from N. 

m 25 gonorrhoeae: 

orf 89 MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 60 

m I I I i I I I II I i I t I I I : t I I I i I I I t I t I I I I I i I I I I I I t t I I I I I I I '- III 

orf89ng MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 60 

O 30 orf 89 ILKNPLDDNQTIENKLEIFVSGYKMMPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 120 

.£ I I I I I I I I : I ::: I I : I I I I I I I 1 I I I I I I I I I I I I : I I I II I I M I I I ! I : I M I I 

s \ s orf 89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 

O orf 89 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKK 162 

=0 35 I I I I I I I I I I I I I I I I I I : I I I I :: I I I : I I I I I I I I I I I 

if * orf 8 9ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 1 62 

The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 

1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTTGTC ACGATACTCG GCATCATCAG CGTCATTGCC ATACCTTCTT 

40 101 ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATGTTCT CAAACAGTTT ATTTTGAAAA ATCCCCAGGA 

201 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

301 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 

45 351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 

4 51 GATAGCGGCT GTGAAGCTTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ED 344>: 

1 MMSNKMEQK G FTLIEMMIVV TILGIISVIA IPSYQSYIEK GYQSQLYTEM 
50 51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS AQAYSDTLSA 
151 DSGCEAFSNR KK* 

This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
55 identity in 162 aa overlap: 
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10 20 30 40 50 60 

orf89-l pep MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 
| | t I I I I t t I II I t t t it I I : I I I I t I t i I I I II I I II I I I I I II M I I I I I I II : III 

orf8 9ng mmsnkmeqkgftliemmiwtilgiisviaipsyqsyiekgyqsqlytemvginnvlkqf 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 8 9-1 pep ilknplddnqtienkleifvsgykmnpkiakkysvsvkfvdkeksrayrlvgvpkagtgy 

Kill | I i : I ::: I I : I I I I I I I I I I I I I I I I I I I I : I M II I I I I I I I I I : I I I I I 
orf8 9ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 
70 80 90 100 110 120 

130 140 150 160 

orf 89-1 . pep T LS VWMN S VGDG YKCRDAAS AQAHLET L S S DVGCE AFSNRKKX 
II I I II II I I I I I I I I I I : I I M : : I I I : I I I I I I I I I I I I 
orf89ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 

130 140 150 160 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF89-1 (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 11A 
shows the results of affinity purification of the GST- fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera gave a positive result in the ELIS A test., confirming that 
ORF89-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 41 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG.GCAAA 

251 AACAAGCGTT GGCCn.AGAA TTTCAACCC. . . 

This corresponds to the amino acid sequence <SEQ ID 346; ORF91>: 

1 MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP... 

Further work revealed the complete nucleotide sequence <SEQ ID 347>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

4 01 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 



CHIR-0160 (356.001) 



-244- 



PATENT 



This corresponds to the amino acid sequence <SEQ ID 348; ORF91-l>: 

1 MKKSSLISAL GIGILSIGMA FA APADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGN PWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF91 shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 91 .pep MKKS S L I S ALG IG I LSI GMAFAAPADAVSQ I RQNATQVLS I LKNGDANTARQKAE AYAI P 
I I I M : M i I II II I M I I I I I I I I I I I : I I I ! I I I I I 11 I I i : I I I I I I I 11 I I I I I I I 
orf 91a MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 

10 20 30 40 50 60 



70 80 90 

orf 91 . pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 

I I I I II I I 1 1 1 I 1 I I I I II I M I I I I I I 
orf 91a Y F D FQRMT ALAVGN PWRTAS DAQKQALAKE FQT LL I RT YSGTMLKLKN AN VN VKDN P I VN 

70 80 90 100 110 120 



orf 91a KGGKE 1 1 VRAE VG V P GQK P WMD FTT YQ S GGKY RT YN VA I EGA S L VT V YRNQ FGE 1 1 KAK 

130 140 150 160 170 180 

The complete length ORF9 la nucleotide sequence <SEQ ID 349> is: 



1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGTGATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 

This encodes a protein having amino acid sequence <SEQ ID 350>: 

1 MKKS SFI SAL GIGILSIGMA FAA PADAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGN PWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 

ORF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 



10 20 30 40 50 60 

orf 91a. pep MKKS S FI S ALGIG I LS IGMAFAAPADAVNQ I RQNATQVLS I LKSGDANTARQKAEAYAI P 
I I I I I : I I II I I I I I I I I I I I I I I I I I I : I I I I I I I II I I I I I : I I I I I I I M I II I I I I 
or f 9 1 - 1 MKKSSLI SALGIGI LS IGMAFAAPADAVSQIRQNATQVLSI LKNGDANTARQKAE AYAI P 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 91a . pep YFDFQRMT ALAVGN PWRTAS DAQKQALAKEFQTLLIRTYSGTMLKLKN AN VNVKDNPIVN 
I I I I I M I I M I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | I | | | | | | | | 
orf 91-1 Y FD FQRMT ALAVGN PWRTAS DAQKQALAKE FQT LLIRTYSG TMLKLKN AN VN VK DN P I VN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 91a . pep KGGKEI IVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEI IKAK 
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| | | I I I I I I I I i I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I i I I I I I I It I I I I I 
or f 9 1 - 1 KGGKEI IVRAEVGVPGQKPVNMDFTT YQSGGKYRTYNVAIEGASLVTVYRNQFGE I IKAK 

130 140 150 160 170 180 

190 

orf 91a . pep GVDGLIAELKAKNGSKX 
II II I M I I I 1 I I I '- I I 
orf 91-1 GVDGLIAELKAKNGGKX 
190 

Homology with a predicted ORF from ^gonorrhoeae 
ORF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (ORF91.ng) from N. 
gonorrhoeae: 

orf 91 . pep MKKS SL I SALGI G ILS IGMAFAAPADAVSQI RQNATQVLS I LKNG DANTARQKAEAYAI P 60 
15 : | | i | : I i I I I I I i I I I I I ill : i I I t I : I I i t I i t I I I : I I I : M I : I I I II M I : I 

orf91ng VKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

orf 91 .pep Y FDFQRMTALAVGN PWXTX S DXQKQALAXE FQP 93 
M I I I I I I I I I I I I I I I II I I I I I I III 
20 orf91ng YFDFQRMTALAVGNPWRTAS DAQKQALAKE FQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 

The complete length ORF91ng nucleotide sequence <SEQ ED 351> is predicted to encode a protein 
having amino acid sequence <SEQ ID 352>: 

1 VKKSSFISAL GIGILSIGMA FA S PADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

25 101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 353>: 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCTCCC CGGCCGACGC AGTGGGACAA ATCCGCCAAA 

30 101 ACGCCACACA GGTTTTGACC ATCCTCAAAA GCGGCGACGC GGCTTCTGCA 

151 CGCCCAAAAG CCGAAGCCTA TGCGGTTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG TACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTCAA AAACGCGACC GTCAACGTCA AAGACAATCC 

35 351 CATCGTCAAT AAGGGCGGCA AGGAAATCGT CGTCCGTGCC GAAGTCGGCA 

401 TCCCCGGTCA GAAGCCCGTC AATATGGACT TTACCACCTA CCAAAGCGGC 

4 51 GGCAAATACC GTACCTACAA CGTCGCCATC GAAGGCACGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATCAT CAAAGCCAAA GGCATCGACG 

551 GGCTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

40 This corresponds to the amino acid sequence <SEQ ID 354; ORF91ng-l>: 

1 MKKS SFI SAL GIGILSIGMA FA S PADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

45 ORF91ng-l and ORF91-1 show 92.3% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf 91-1 . pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
I I II I : I I I I I I I I I I I I I I I I : I I I I I : I I I I I I I I I I : I I I : I I I : I I I I I I I I : I 
orf 91ng-l MKKS S FI S ALG I G I L S IGMAFAS PADAVGQ I RQNATQVLT I LKSGDAAS ARPKAE AYAVP 
50 10 20 30 40 50 60 

70 80 90 100 110 120 

Orf 91-1 . pep Y FDFQRMTALAVGN PWRTAS DAQKQALAKE FQTLLIRTYSGTMLKLKNANVNVKDNP I VN 
I I M I II I I I I I I I II 1 I I I I M I I 1 I I I I I II I I 1 I I I I I 1 I I 1 : M I : I I I I I I I I I I 
55 orf 91ng-l YFDFQRMTALAVGNPWRT AS DAQKQALAKE FQTLLIRTYSGTMLKFKNATVNVKDNPIVN 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf91-l pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
| | | | | | : I { 1 I I i : I M I I I II I I I II I I I I I II I I I I f M I : I I I I M I I ! 1 I I I M I I 
5 orf91ng-l KGGKEIWRAEVGIPGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYRNQFGEIIKAK 

130 140 150 160 170 180 

190 

orf 91-1. pep GVDGLIAELKAKNGGKX 
10 I : I I I I I I I I I I I I I I I 

o r f 9 1 ng- 1 GIDGL I AE LKAKNGGKX 
190 

In addition, ORF91ng-l shows homology to a hypothetical E.coli protein: 

sp|P45390|YRBC_ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 
15 REGION PRECURSOR (F211) >gi 1606130 (U18997) ORF_f211 [Escherichia coli] 

>gi | 178 9583 (AE0003 99) hypothetical 24.0 kD protein in murZ-rpoN intergenic 
region [Escherichia coli] Length = 211 

Score = 70.6 bits (170), Expect - 6e-12 
20 Identities = 42/137 (30%), Positives = 76/137 (54%), Gaps = 6/137 (4%) 

Query: 59 V P Y FDFQRMTALAVGN PWRTAS DAQKQALAKE FQT LL I RT YSGTMLK FKNATVNVKDN P I 118 

+PY + AL +G +++A+ AQ++A F+L+Y ++ T + P 
Sbjct : 65 LP YVQVKYAGALVLGQYYKSAT PAQREAYFAAFREYLKQAYGQALAMYHGQTYQI A — PE 122 



25 



Query: 119 VNKGGKEIV-VRAEVGIP-GQKPVNMDFTTYQSG — GKYRTYNVAIEGTSLVTVYRNQFG 174 

G K IV +R + P G+ PV +DF ++ G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 



30 Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae , and their epitopes, 
35 could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 42 

The following DNA sequence was identified in N. meningitidis <SEQ ID 355>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

40 101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

45 351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 356; ORF97>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 
50 51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ED 357>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 
55 51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 
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101 TGACCACGCA 
151 CGCCTTGAAA 
201 CGACCATCAG 
251 AAGTCATCGT 
301 GACCCCGCCT 
GGACGGCAAA 
GCAGCCGCAT 



351 
401 



4 51 AAACTGATAC 



TACCCTCACC 
CCGCCATAAA 
GAAGCCGCCC 
CTTCGGCACG 
TCGCCCTGCA 
GTACGCGCCG 
CGGTTTCGAC 
AAAAAACCGT 



TCAAAATACA 
AAGCAAAGGG 
GCCGAAACGG 
CCCAAAGCCG 
ACTGCCCCTA 
CCTATACCGA 
GAAGTGGCAA 
AGGCGAATAA 



GTTTTGACGA 
ATGGACATTT 
CTTAACGATG 
GCACGCCGCT 
CGCGTCCTCG 
TACGCGCGCC 
ACACTTTGGC 



AACCGTCAGC 
TTGCCGTCAT 
CAGCCGGCAA 
GATGGTCAAA 
TTACCGAAAC 
CTCATCGCCG 
AAACGCCGAA 



10 



15 



20 



ID 25 



30 



This corresponds to the amino acid sequence <SEQ ID 358; ORF97~l>: 

1 MKHILPLIAA 5ALCISTA5A HPASEPSTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF97 shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

or f 97 .pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 

I HIM I I I I I I I I I I t I II II :! I I I 1 I I II I I Mill : : I I I I I I 
orf 97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 97 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

II I I ! I I I I I I I I I I I I I I I I I I I I I I I I I M I I i I 1 I I I I I I I I I I I I I I I I I I I I I I 
orf 97a MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

70 80 90 100 110 120 

130 140 150 160 

orf 97 .pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I : I I I 
orf 97a VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 

130 140 150 160 



35 The complete length ORF97a nucleotide sequence <SEQ ID 359> is: 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGANACACA 
CGCTTCGGNN 
TGACCACGCA 
CGCCTTGAAA 
CGACCATCAG 
AAGTCATCGT 
GACCCCGCCT 
GGACGGCAAA 
GCAGCCGCAT 
AAACTGATAC 



TACTCCCCCT 
CATCCTGCCA 
TACCCTCACC 
CCGCCATAAA 
GAAGCCGCCC 
CTTCGGCACG 
TCGCCCTGCA 
GTACGCGCCG 
CGGTTTCGAC 
AAAAAACCAT 



GANTGNCGCA 
GCGAACCGCA 
TCAAAATACA 
AAGCAAAGGG 
GCCGAAACGG 
CCCAAAGCCG 
ACTGCCCCTG 
CCTATACCGA 
GAAGTGGCAA 
AGGCGAATAA 



TCCGCACTCT 
AACCCAAAAC 
GTTTTGACGA 
ATGGACATTT 
CTTAACGATG 
GTACGCCGCT 
CGCGTCNTCG 
TACGCGCGCC 
ACACTTTGGC 



GCATTTCAAC 
GAAACCGCTA 
AACCGTCAGC 
TTGCCGTCAT 
CAGCCGGCAA 
GATGGTCAAA 
TTACCGAAAC 
CTCATCGCCG 
AAACGCCGAA 



This encodes a protein having amino acid sequence <SEQ ID 360>: 



1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

50 151 KLIQKTIGE* 

ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap: 

10 20 30 40 50 60 

orf 97a. pep MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
cc 1 ) I II I I I II I II I I I I I II II: I I I I I I I I M I I I I I I I I I I I I I I I I I | M M I 

->-> orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orf97a pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

5 I i i I I I I I I I I I I I I I I I I I ! I I H I I I 1 I I M I I I I I 1 I 1 M M I i I I i I I i t I I I II 

orf97-l MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
70 80 90 100 110 120 

130 140 150 160 

10 orf97a.pep ' VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 

I I M I M I I I I I I I M I M I I I I I I I I I I I I I I I I I *■ I I! 
orf 97-1 VRAAYT DT RALI AG SRI GFDE VANTLANAE KL IQKTVGEX 

130 140 150 160 

15 Homology with a predicted ORF from N. gonorrhoeae 

ORF97 shows 88.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from N. 
gonorrhoeae: 

orf 97 . pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 60- 
I II I I I I I I I I : I I I M I M I I : : 1 1 II I I I I Mil Hill : : I I I I I I 

20 orf 97ng MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 60 



25 



orf 97 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

I I I I I I I I I I I I I I II I I I I I I I 1 I II I 1 II I I I I I I I I I ! II M I II M I I I I I I ! I 11 
orf 97ng MD I FAV I DHQEAARRN GLTMQPAKV I V FGT PKAGT PLMVKDP AFALQLPLRVLVTET DGK 120 

orf 97 .pep VRAAYT DTRAL I AG S R IG FDEVANT LAN AEKL I QKT VGE 159 

I I : II I I ! I I ( I : I I I I : I I I I I I I I I I I II II I II I I I 
orf 97ng VRTAYT DTRAL IVGSRIS FDEVANT LAN AEKLIQKTVGE 159 



The complete length ORF97ng nucleotide sequence <SEQ ID 361> is predicted to encode a protein 
30 having amino acid sequence <SEQ ID 362>: 

1 MKHILPPIAA SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGT PLMVK 
101 DPAFALQLPL RVLVTETDGK VRTAYT DTRA LIVGSRISFD EVANTLANAE 
151 KL I QKT VGE* 

35 Further work revealed the complete nucleotide sequence <SEQ ID 363>: 

1 AT GAAAC AC A TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

40 201 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

401 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

45 451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: 

1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGT PLMVK 

101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

50 151 KLIQKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

orf 97-1 . pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
I I II I I I I I I I I 1 I I I I I I i I I I : : I I I I II I I I I II I II I I I I I I j| I I I I I | I | | | | 
55 orf 97ng-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 97-1 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
t I I M II I I t I I It I I I I II I I I I t I I I t II I II I II ! t It I I I I I I I I I I I ! I I I II I I 
5 orf 97ng-l MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 100 110 120 

130 140 150 160 

orf 97-1 . pep VRAAYT DTRAL I AGSRI GFDE VANT LANAEKLI QKT VGEX 
10 I I : I I I I II I 11 : II I ! : I I I I M I M I I I t I I I 1 I I I II 

orf 97ng-l VRTAYTDTRALIVGSRXSFDEVANTLANAEKLIQKTVGEX 

130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their 
1 5 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF97-1 (15.3kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
e% above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 

]H 12A & 12B show, repsectively, the results of affinity purification of the GST-fusion and His-fusion 

Q proteins. Purified GST-fusion protein was used to immunise mice, whose sera were used for 

m 20 Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 12D). These 

experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a useful immunogen. 

13 Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 

Example 43 

: t? The following DNA, believed to be complete, sequence was identified in N meningitidis <SEQ ID 

25 365>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 

30 201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 

351 CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

4 01 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

35 4 51 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 



1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

40 51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ID 367>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

45 51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 
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101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACT GGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEA RI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT JVRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF 106a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

orf 1 0 6 . pep MAFITRLFKSSK-WLIVPLMLPAFQOTAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 
i | | i f i I M I I 1 i : : II : : : : M I I I I I I II I t M : I I I I I I I I I I I I I i I I 
orfl06a MAFITRLFKS IKQWLVLLPMLSVLPDAAAEGIDVSRAEARIXDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 



60 70 80 90 100 110 119 

orf 106. pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 
It i III II II II I I I I I II I I I I I I I I I I I I I I I I I I I II I t I : I I I II II t 
orf 106a LQXAXXRGVXLNXT LXWQL S AP 1 1 AS YRFXLGQL I G DDDX I D YKL S FHPLTNR YRVT VGA 

70 80 90 100 110 120 



120 130 140 150 160 170 179 

orf 106. pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 
III I I I I II I I I I M I I I M I II I I M I I I I I I I I I ! I I I I I II I I I I I I I I I t I I I 1 I 
orf 106a FSTXYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 
130 140 150 160 170 180 



180 190 199 

orf 10 6 . pep SQNWHLDSGWKPLNI IGNKX 
I I I I I I I II I I I I I I I II II 
o r f 1 0 6a SQNWHLDSGWKPLN I IGNKX 

190 200 



Due to the K->N substitution at residue 1 1 1, the homology between ORF 106a and ORF 106-1 is 
87.9% over the same 199 aa overlap. 



The complete length ORF106a nucleotide sequence <SEQ ID 369> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GCTGCCGATG CTTTCCGTTT TGCCGGACGC GGCGGCGGAG GGGATAGATG 

101 TGAGCCGCGC CGAAGCGAGG ATAANCGACG GCGGGCAGCT TTCCATNAGN 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAANNNG CGNNGNGCCG 

201 GGGCGTGNCG CTCAACTNTA CCTTAAGNTG GCAGCTTTCC GCCCCGATAA 

251 TCGCTTCTTA TCGGTTTNAA TTGGGGCAAC TGATTGGCGA TGACGACNAT 

301 ATTGACTACA AACTGAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCG TTTTCGACAG ANTACGACAC CTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGCTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TTCAAATCAA TGCATTGACT TCTCAAAACT 
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551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 370>: 

! MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEAR IXDGGQLSXX 

51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Homology with a predicted ORF from N. gonorrhoeae 

ORF106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) from N. 
gonorrhoeae: 

orfl06 pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 59 

I i I I t I I I I I I It:: : I : : : : I I I ! I :: I I I I I I I I I I : I I I M I I I I I I I I I 
orfl06ng MAFITRLFKSIKQWLVLLPILSVLPDAAAEGIAATRAEARITDGGRLSISSRFQTELPDQ 60 

orf 10 6 pep LQQALRRGV PLN FTL S WQL S AP 1 1 AS YR FKLGQ L IGD D DN I D YKL S FH PLTKRYRVT VG A 119 

I | | I i i I I i ( I II I t I I I II t I I I I I I I I I II I I I I M I I I M M I I I I I : I I I I I I M 
orf 10 6ng LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 120 

orf 106 . pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 179 

I I || M I I I I II I I II I I I I II I II I I I II M i i i I I I I I ! I i i I I I I I I I 1 i I I i i I i I 
orfl06ng F ST DYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAE IRLTLSTSKL PKPFQINALT 180 

orf 10 6. pep SQNWHLDSGWKPLNI IGNK 198 

I I I I I I I I I I I I I I I II I I 
orflOSng SQNWHLDSGWKPLNI IGNK 199 

Due to the K»N substitution at residue 1 1 1, the homology between ORF106ng and ORF 106-1 is 
91.0% over the same 199 aa overlap. 



The complete length ORF106ng nucleotide sequence <SEQ ID 37 1> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 



1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEAR ITDGGRLSIS 

51 SRFQTELPDQ LQQALRRGV P LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N, meningitidis and N. gonorrhoeae, and their 



epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF 106-1 (18kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13A shows the results of affinity purification of the His-fusion protein, and Figure 13B shows the 
results of expression of the GST-fusion in E.colL Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13C) These experiments confirm that 
ORF 106-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 44 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
373>: 

1 ATGGACACAA AAGAAATCCT CGG . TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCc TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

4 51 CTCGCCATCC TGCTGCTG.T GCCGCTGACG GTCGGGCTGC T*GCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG . TGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCCGGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

1201 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 

1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCTTCGGCA 

1301 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 

1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

14 01 GAAAAAACAA GGTTTCCCAT TATGA 

This corresponds to the amino acid sequence <SEQ ID 374; ORF10>: 

1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEE NAPPARLSAT AESAAALLAS 

301 ALCXTGIFSP LASLLLPENY AAVRFIWSC MXPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

401 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

451 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 
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151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

1301 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence <SEQ ID 376; ORF10-1>: 



1 MDTKEILGYA AGSIG5AVLA VIILPLLSWY FFA DDIGRI V LMQTAAGLTV 

51 SVLCL GLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQLVPK 

151 LAILLLLPLT VGLL HFPANT AVLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AES AAALLAS 

301 ALCLTGIFSP LA SLLLFBNY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 RKTRP IALAT LGALAANLLL LG LAVPSGGA R GAAVACAAS FWLFFAFK TE 

4 01 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

Computer analysis of this amino acid sequence gave the following results: 
Prediction 

ORF10-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 

Homology with EpsM from Streptococcus thermophilus (accession number U40830). 
ORF10 shows homology with the epsM gene of S. thermophilus, which encodes a protein of a size 
similar to ORF10 and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 

Identities = (25%) 

Query: 213 LRYGIPLALSSLAYWGLASADRLFLKKYAGLEQLGVYSMGISFGGAALLLQSIFSTVW 270 

L Y +PL SS + +W L ++ R F+ + G G+ ++ + +IF+ W 

Sbjct: 210 LY YALPLI PS S I LWWLLNAS SRYFVLFFLGAGANGLLAVATKI PS 1 1 S I FNT I FTQAW 267 



Identities = 15/57 (26%) , Positives = 31/57 (54%) 



Query: 7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 

L + G++GS +L +++PL ++ + G L QT A L + ++ + + A +R 

Sbjct: 12 LVFTIGNLGSKLLVFLLVPLYTYAMTPQEYGMADLYQTTANLLLPLITMNVFDATLR 68 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Identities - 16/96 (16%), Positives = 36/96 (371) 

Query. 307 I FS PLAS LLL PEN YAAVRFT W S CML PPL FYT LTE I SG I GLN WRKTR P I XXXXXXXXXX 366 

+ p+ + + +YA+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLK P I VE KW S S D Y AS S WQ Y V P F FML SML F S S F S D FFGTN Y I AAKQT KG V FMT S I YGT I V 364 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF10 shows 95.4% identity over a 475aa overlap with an ORF (ORF 10a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 10 . pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I 1 I 1 1 I I I I I I I I I I I I I I I I I I I I I I I I II I I I 1 I I I I I I I I 1 I M I I I I I I I I! I I 1 
orf 10a MDTKEILGYAAGSIGSAVIAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I I I I I I I : I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I M I I I 11 1 I I I! I II I I I 
orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I I I I II I I I I I M I I I II I I I I I I I ! I I I I I I II I I I I I I I I I M I II 1 1 I I M I I I 1 
orf 10a LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10 . pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
I I 1 I I I II I I I I I I I I I II I : II I I Mill! I 1 I I I M M II I I I I I I I I I I I I I I I I 
orf 10a N LAAAAFL L FQNRCRLKAVRRAP FS S AVLHRGLRYG I P I AL SSI AYWGLAS ADRL FLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10 . pep AGLEQLGVYSMGI SFGGAALLFQS I FSTVWTPYI FRAIEENAPPARLSATAESAAALLAS 
I I I I I I I I M I I I I I ! I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I II I M I M I 
orf 10a AGLEQLGVYSMGI SFGGAALLFQS I FSTVWTPYI FRAIEANAPPARLS AT AESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 10 . pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNVVRKTRPIALAT 
III I I I M II I I II I I I I II I I I I I I 11 M I I I I I I I : I I I I I M I II I I I I M I I I I 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 

370 380 390 400 410 419 

orf 10 . pep LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
I i I I I I I I I II I I III: I i II I I I I I I I I I I : II I M I I II I II I M I I I I : I I 
orf 10a LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 

420 430 440 450 460 470 

orf 10 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I I I : I I I I I I I I II II I f 1 I I f I I I I : I I I II I I I It I I I I I I I I I I I I I I I i i I 
orf 10a L FCLAS S AAYTC FGT PAN Y PL FAGVWAV YLAGC I LRHRKDLHKL FH YLKKQG FP LX 

420 430 440 450 460 470 



The complete length ORF 10a nucleotide sequence <SEQ ID 377> is: 



60 



1 ATGGACACAA 

51 GGTTTTAGCC 

101 ACGACATCGG 

151 TCGGTGTTGT 

201 CGCCGCCGAC 

251 TGTCTGCCGC 



AAGAAATCCT 
GTCATCATCC 
ACGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 



TCGGCAGCGC 
TTCCCTGCCG 
GCTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
ATCCCTGCCG 
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301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGTCCAAG 

451 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCACCGT TTTCATCCGC CGTCCTGCAT CGCGGCCTGC GCTACGGCAT 

651 ACCGATCGCA CTAAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTAG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG AGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGCA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCTC 

1001 CGCTGTTTTG CACGCTGGTA GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGAAAAACAC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTTTGGCTGT TTTTTGTTTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTGGCCT CCTCGGCGGC CTACACCTGC TTCGGCACTC 

1301 CGGCAAACTA CCCCCTGTTT GCCGGCGTAT GGGCGGTATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 378>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAPPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFIWSC MLPPLFCTLV EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AG WAVY LAG 

4 51 CILRHRKDLH KLFHYLKKQG FPL* 

ORFlOa and ORF10-1 show 95.4% identity in 475 aa overlap: 

10 20 30 40 50 60 

MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
MINI! I I M I I I I I i I I M I 1 I I I I I I 1 I I I I I II 1 I I I I I M I I I I M I I I ! I I I I 
MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
10 20 30 40 50 60 

70 80 90 100 110 120 

YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I I I I I I I : I I I I M I I I I I ! I I I I I ! I I I M I I I I I I I I I I I II I II I I ! I I 1 I I I I II I 
YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
70 80 90 100 110 120 

130 140 150 160 170 180 

LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I 1 I I I I I I I I I I I I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I I II I I ! I I I I II I I I I 
LS FL P IRFLLLVLRMEGRALAFS S AQLVSKLAI LLLL PLT VGLLHFPANTAVLTAVYALA 
130 140 150 160 170 180 

190 200 210 220 230 240 

NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
I I I I I I I I I I I I I I I I I I I I : I I I I I I II I I I I I I I I I I I II I I M II I I I I I I II I I 
NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 
190 200 210 220 230 240 

250 260 270 280 290 300 

AGLEQLGVYSMGI S FGGAALLFQS I FSTVWTPYI FRAIEENAPPARLSATAESAAALLAS 
I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I 1 I I I I I I I I I I I II I I I I I I I I II I M I 
AGLEQLGVYSMGI S FGGAALLFQS I FSTVWTPYI FRAIEANAPPARLSATAESAAALLAS 
250 260 270 280 290 300 



orf 10-1 . pep 
orf 10a 

orf 10-1 . pep 
orf 10a 

orf 10-1 .pep 
orflOa 

orf 10-1 .pep 
orflOa 

orf 10-1 . pep 
orflOa 



310 320 330 340 350 360 
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10 



15 



orf 10-1 .pep 
orflOa 

orf 10-1. pep 
orflOa 

orf 10-1 .pep 
orflOa 



ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 

Ml | | I I M I I I I I I I I I I I I I I I I I M I I I 1 1 I I I I : I I 11 I II I I I I 1 I I 11 M 1 1 
ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNVVRKTRPIALAT 
310 320 330 340 350 360 

370 380 390 400 410 419 

LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

| | | | | M I I 1 I I I III: I I I I I I I I I I I I I I : I I I I I I I M I I II I I I I I I : I I 
LGALAANLLLLGL--AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 

420 430 440 450 460 470 

L FCLT S S AAYTC FGT PAN Y P L FAGVWAAYLAGC I LRHRKDLHKL FH YLKKQG FPLX 
I I I I : I I I I I II I M I I I 1 I I I I I I I I •' I I I ! 1 I 1 I I I I I M I I I I I II II I I M I 
LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 



Homology with a predicted ORF from N. gonorrhoeae 

ORF10 shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from N. 



20 gonorrhoeae: 



orf lOng.pep 



orf lOnm 



MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 
I M | I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M M I M I I I I I I I I I I 
MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 



25 



30 



35 



40 



45 



50 



55 



orflOng.pep YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

I I I I I 11 : I I I I II I I I I I I I 11 I : I I I I I 1 I I I I I I I I I 1 1 I I I I I M I I II I II 1 1 i 
orflOnm YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

orf lOng .pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 

1 I I 1 I I 1 I I II I M I t M I II I M I I I I I II ! i I I I I I I I II I 1 I I I I i : I I II I I I M 
orflOnm LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 180 



orf lOng . pep 
orflOnm 



orf lOng. pep 



orflOnm 



orflOng.pep 
orf lOnm 



orf lOng . pep 
orf lOnm 



NLAAAAFLLFQNRCRLKAVRRAPFS PAVLHRGLRYGI P LALS S LAYWGLAS ADRLFLKKY 240 
I I I I I I I II I I I I II t M I I : I M I I I I I I I I I I I I I : I I I I : I I I M I I II I I I I I I I 
NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSS I AYWGLAS ADRLFLKKY 24 0 



AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 
I I I I M I I I I I I I II I I I I I I : I I II I I I I I I II I I 1 I I I I I II I I I I I I I I I I I I I I I 
AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 



300 



300 



ALCLTGI FS PLASLLLPEN YAAVRFTVVSCMLPPLFYTLTEI SGIGLNVVRKTRP IALAT 360 
III I I I I I I I I I I I I I I M I II I I I I I I I (Ml I I : I I I I II I I I I I I I I I I I I M 
ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNVVRKTRPIALAT 360 

370 380 390 400 410 

LGALAANLLLLGL — AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

I I I 1 I I I I II I 1 I 111: I I I I I I I I I I I I I I : I I I I II I I I I II I I I I I I I : I I 

LGALAANLLLLGL DRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
370 380 390 400 410 



420 430 440 450 460 470 

orflOng.pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 
I I I 1 : I I I I I I I I 1 I I I M I I I I I I II I I I I I I I I I I I I : I I I I I I I I I I II 1 I I I 
orflOnm LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 



60 



1 ATGGACACAA 

51 GGTTTTAGCC 

101 ACGACATCGG 

151 TCGGTATTGT 

201 CGCCGCCGAC 

251 TGTTTTCCGC 

301 TCTGAAATCC 

351 GCTGTTTGAA 



AAGAAATCCT 
GTCATCATCC 
GCGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 
TGTTTTCGCT 
CTGAGCTTCC 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 



TCGGCAGCGC 
TTCcccgCCG 
ACTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
GTCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
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401 GTATGGAAGG GCGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAA 

4 51 CTCGCCATTC TGCTGCTGTT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC TCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCGCCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGCTCGCA CTGAGCAGCC TTGCCTATTG GGGGCTGGCA TCCGCCGACC 

7 01 GTTTGTTCCT GAAAAAATAT GCGGGCCTGG AACAGCTCGG CGTTTATTCG 

7 51 ATGGGTATTT CGTTCGGCGG GGCGGC ATT A TTGCTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGTGC AATCGAAGAA AACGCCACGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGAAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTACCGT CGTATCGTGT ATGCTGccgc 

1001 cgctGTTTTA CACGCTGACC GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GTCCGATCGC GCTTGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCACG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGTTGT TTTTTGTTTT CAAGACAGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTgGCCT CCTCGGCGGC CTACACCTGC TTCGGCACAC 

1301 CGGCAAACTA CCCcctgttt gccggcgtAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AAATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 380>: 

1 MDTKEILGYA AGS I GSAVLA VIILPLLSWY FPA DDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTL FL PPLLFSAAIA ALLL SRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 IAIL LLLPLT VGLLHFPANT SVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSPAVLH RGLRYGIPLA LSSLAYWGLA SADRLFLKKY AGL.EQLGVYS 

251 MGISFGGAAL LLQSIFSTVW TPYIFRAIEE NATPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFTVVSC MLPPLFYTLT EISGIGLNW 

351 RKTRPI ALAT LGALAANLLL LGLA VPSGGT RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKNLH KLFHYLKKQG FPL* 

ORFlOng and ORF10-1 show 96.4% identity in 473 aa overlap: 

10 20 30 40 50 60 

MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
1 I I I I I I I I I 1 I I M I I I I i I 1 ! 1 I 1 I I I I I M ! I I ! I I 1 I I I M I i II I I f I ! I I I I i I 
MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
10 20 30 40 50 60 

70 80 90 100 110 120 

YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

I I I I I II: I I I I t II I ! I I I I I 1 ! : I I I I I I I I I I I I i ! I I i I I I I M I I I 1 I I II I I I 
YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

II II I I I I II I I I I I I I II II i I I I I I I M i I I I I I i I I I I I I M I t t i I : I I I I I I I I I 
LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 
I I 1 M I I I I I I I 1 I I I I I II : I I I I I M I I t I I I I I I ( : I I M : I I I I I I M I i i t I i t t 
N LAAAAFLLFQNRCRLKAVRRAP FS PAVLHRG LR YG I PLAL S S L A YWGLAS ADRL FLKKY 
190 200 210 220 230 240 

250 260 270 280 290 300 

AGLEQLGVYSMGI S FGGAALLFQS I FSTVWT P Y I FRAI EENAP PARLS ATAE S AAALLAS 
I I I t I I I I I t t I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I f I I I I I I I I I I I I I I I t I 
AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 
250 260 270 280 290 300 

310 320 330 340 350 360 

ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNWRKTRPIALAT 
I I I II I I I I I I I I i f I I I I I I I I I I I I I I I I I I I I I I : I t I II I I I I I I I I I I I I I 1 I 



orf 10-1. pep 
orf 10ng-l 

orf 10-1 .pep 
orf 10ng-l 

orf 10-1 .pep 
orfl0ng-l 

orf 10-1 .pep 
orf 10ng-l 

orf 10-1 .pep 
orf 10ng-l 

orf 10-1 .pep 
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orflOna-1 ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 
y 310 320 330 340 350 360 

370 380 390 400 410 420 

or f 10-1 pep LGALAANLLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 
||Mltltlj|IMIMtt:lllMIII!ltllll:IIMIIIIlM!IMIMI:llll 
orflOng-1 LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 
370 380 390 400 410 420 

430 440 450 460 470 

orflO-1 pep C LT S S AAYT C FGT P AN Y P L FAG VW AAY LAG C I LRHRK D LHK L FH Y L KKQG F P LX 
M : I I I M 1 I ! I I I M I I I I I I i I I I I 1 I I i t 1 I I I I : I I I I I I I M I I M I I I 
orflOng-1 CLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from K meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 45 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 38 1>: 

1 - , ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA TAAAGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

201 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

251 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AGCGGTAAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGC 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

401 GCAgCATCGA AAAmGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

451 AA, AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 

This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 

1. .ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REEPDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 
101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGCTTC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 CAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGATAAAG CCGACGAGGT TGAAGAAAAG GCGGGCGAGC CGGAACGGGA 

351 AGAGCCGGAC GGACAGGCAG TGCGTAAGAA AGCGCTGACG GAAGAGCGTG 

401 AACAAACCGT CAGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

4 51 AAACAAGCGG TAAAACCGTC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGCGAAGG AAAAAGTTGC ACCCAAACCA ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCCGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GTCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGTCAGAG CGCGGAAGGG CAGCGTGCCA 
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701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTSDK AEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF65 shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) from strain A of N. 
meningitidis: 

10 20 30 

orf65 pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

I I I I : I I 1 I 1 I I I : I I I I II I 1 I I I I I I 
orf 65a 1 1 AG I L F Y LNQ S GQN AFK I P V P S KQ P AE T E I LK PKN Q PRE D I Q PE P ADQN ALS E P D AAKE 

30 40 50 60 70 80 



40 50 60 70 80 90 

orf 65 . pep AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
I | M I M : II I I M I I I I I I I i I I I I Mill: I I II I I I I I I I I I I 1 I I I I I I I I I I 
orf 65a AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 
90 100 110 120 130 140 

100 110 120 130 140 150 

orf 65 . pep AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
11(11 I I II I I I II I I I II t I I! I I I I I I I I I II I I I I I I I I I II I I I I I I I I I II I 
orf 65a AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQILNSGSIEKARSAAAKEVQKM 
150 160 170 180 190 200 

160 170 180 190 200 210 

orf 65. pep XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

orf 65a KTPDECAEATHYLQMGAYADRRSAEGQRAKLAILGISSKWGYQAGHKTLYRVQSGNMSAD 
210 220 230 240 250 260 

The complete length ORF65a nucleotide sequence <SEQ ID 385> is: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGTTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAAG CCGACGAGGT TGAGGAAAAG GCGGACGAGC CGGAGCGGGA 

351 AAAGTCGGAC GGACAGGCAG TGCGCAAGAA AGCACTGACG GAAGAGCGTG 

401 AACAAACCGT CGGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCATC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGAGAAGG AAAAAGTTGC ACCCAAACCG ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCTGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GCCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGCCGGAG CGCGGAAGGG CAGCGTGCCA 

7 01 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 386>: 
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1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS IEKARSAAAK 

5 201 EVQKMKTPDK AEATHYLQMG AYADRRSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

ORF65a and ORF65-1 show 96.5% identity in 289 aa overlap: 

10 20 30 40 50 60 

orf 65a . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPVPSKQPAETEILKPK 
10 I { I 1 I I I I ! I I I I t I I I I I I I I i I I ! I I I I I i I I I I I I M I I I I I : I I I I II I I 1 I I M 

orf 65-1 MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQ PAETEILKPK 

10 20 30 40 50 60 

70 80 90 100 110 120 

1 5 orf 65a . pep NQPKEDIQPEPADQNALSEPDAAKEAEQSDAEKAADKQPVADKADEVEEKADEPEREKSD 

I | I I I 1 I I I I I I I I I I I 1 II 1 II I I I I I I I I I I i I I I 1 i I I I I I I I M I I Mill: I 
orf 65-1 NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
70 80 90 100 110 120 

20 130 140 150 160 170 180 

orf 65a. pep GQAVRKKALTEEREQTVGEKAQKKDAETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKP 
! II I I i II I I I I 1 I I I I I I I I I I I I I I t I I I i I I ! I I I I i ! I I I M I ! t I I i I I I I i ! 
orf 65-1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
130 140 150 160 170 180 

25 

190 200 210 220 230 240 

orf 65a . pep TPEQILNSGS IEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI 
I t t I I I 1 t I t I I I I t I t I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I M 
orf 65-1 TPEQILNSGSIE KARS AAAKE VQKMKT S DKAE AT H Y LQMG AYADRQ S AEGQRAKLA I LG I 

30 190 200 210 220 230 240 

250 260 270 280 290 

orf 65a . pep SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
I I I I I 1 I I 1 I I 1 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
35 orf 65-1 SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from N. 
40 gonorrhoeae: 

30 40 50 60 70 80 

ORF65ng IIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLKNQPKEDIQPEPADQNALSEPDVAKE 

III : I I I I I I I I : I I I I I I I I I I I : I I 
ORF65 I LK PHNQLKE D I Q P D P ADQN AL SEP D AAT E 

45 10 20 30 

90 100 110 120 130 140 

ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

I I I I I I I : I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
50 ORF65 AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

40 50 60 70 80 90 

150 160 170 180 190 200 

ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 
55 | | | i | : f I I I I I I I I I M I I I 1 I 1 I I I 1 I I I I I I I I I I I f I I lit I I I I i I I 1 I I I I 

ORF65 AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
100 110 120 130 140 150 

210 220 230 240 250 260 

60 ORF65ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 

I I I I I ! I I I I I I I I I : I I I I I I I I I I I I I I I I : I I : I I I I I I I I I : I I I M I I 
ORF65 XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
160 170 180 190 200 210 
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ORF65ng 
ORF65 

An ORF65ng nucleotide sequence <SEQ ID 387> was predicted to encode a protein having amino 



MR 
I I 
MR 



5 acid sequence <SEQ ID 388>: 



10 



1 MFMNKFSQSG 

51 PAETEILKLK 

101 ADKADEVEEK 

151 KKAVKPSKET 

201 EVQKMKNFGQ 

251 DIKRFTACKA 



KGLSGFFFGL ILATVIIAGI 



NQPKEDIQPE 
AGEPEREEPD 
EKKASKEEKK 
GGSQRIICKW 
AICPPMR* 



PADQNALSEP 
GQAVRKKALT 
AAKEKVAPKP 
ARMPNPGARK 



LLYLNQGGQN 
"DVAKEAEQSD 
EEREQTVREK 
TPEQILNSRS 
GSVPNWQSWA 



AFKIPAPSKQ 
AEKAADKQPV 
AQKKDAETVK 
IEKARSAAAK 
YLPKWSAIRR 



After further analysis, the complete gonococcal DNA sequence <SEQ ED 389> was found to be: 



15 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTATGA 
CTTCGGTTTG 
TGAACCAGGG 
CCTGCAGAAA 
CCAACCTGAA 
AAGAGGCAGA 
GCCGACAAag 
aGAGCCGGAC 
AACAAACcgt 
AAacaaGCgg 
agagaaaaag 
aaatcctcaa 
gaAgtgcaGA 
CTGcaaatgg 
ccaaACtggc 
GGACATAAAA 
gGTGAAAAAA 
TCCGTGcgAT 



ACAAATTTTC 
ATACTGGCAA 
CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GC AGTCGGAT 
ccgacgAGGT 
ggACAGGCAG 
cagggAAAAA 
tAaaaccgtc 
gcggcgaaag 
cagccgCagc 
AAatgaaaaa 
gcgcgtatgc 
aAtcttgGgc 
CGCTTTACCG 
ATGCAGGACG 
TGAAGGCAAA 



CCAATCCGGA 
CGGTCATTAT 
GCGTTCAAAA 
GAAACTGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAAGAAAag 
TGCGCAAGAA 
GCGCagaaga 
tAAAGAAACa 
aaaAAGttgc 
atcgaaaaag 
ctTtgggcaa 
cgaccgtccg 
atatctTccg 
CGTGCAAagc 
AGTTGAAAAA 
TAA 



AAAGGTCTGT 
TGCCGGTATT 
TCCCGGCTCC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GcGGgcgAgc 
AGCACTGAcg 
AAGATGCCGA 
gagaaaaaag 
acccaaaccg 
cgcgtagtgc 
ggcgGaagcc 
gagcgcggaA 
aagtggtcgG 
GGCAatatgt 
GCATGGGGtt 



CCGGTTTCTT 
TTGCTTTATC 
GTCGAAGCAG 
AGGAAGACAT 
GATGTTGCGA 
GCAGCCCGTT 
cggaACGGga 
gAAGAgcGTG 
AACGgTTAAA 
cTtcaaaaga 
accccggaaC 
cgctgccaaa 
aacgcattaT 
gggcagcgtg 
CTATCAGGCG 
ccgccgatgc 
gcCAGCCTGA 



This encodes the following amino acid sequence <SEQ ID 390>: 



35 



1 MFMNKFSQSG 

51 PAETEILKLK 

101 ADKADEVEEK 

151 KQAVKPSKET 

201 EVQKMKNFGQ 

251 GHKTLYRVQS 



KGLSGFFFGL ILATVIIAGI 



NQPKEDIQPE 
AGEPEREEPD 
EKKASKEEKK 
GGSQRIICKW 
GNMSADAVKK 



PADQNALSEP 
GQAVRKKALT 
AAKEKVAPKP 
ARMPTVRSAE 
MQDELKKHGV 



LLYLNQGGQN 
DVAKEAEQSD 
EEREQTVREK 
TPEQILNSRS 
GQRAKLAILG 
ASLIRAIEGK 



AFKIPAPSKQ 
AEKAADKQPV 
AQKKDAETVK 
IEKARSAAAK 
ISSEWGYQA 



ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

orf 65-1 .pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 
I I I I I I I II I I I I ! i I f i I II i i i M I I 1 ! I : I ! 1 I : I I I I I I I ! I i I M I I I i M I I 
orf65ng-l MFMNKFSQSGKGLSGFFFGLILATVIIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 65-1. pep NQPKE DIQPE PADQNALSE PDAATEAEQS DAEKAADKQPVADKADEVEEKAGE PEREE PD 
I I I I I II I I I I 1 I I I I I II I I : I II I I I I I I I M I I I I I I I I ! I I I I I I I I I I I I I I I I 
orf 65ng-l NQPKEDIQPEPADQNALSEPDVAKEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 65-1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
If I ( I I I I I I I I ( I I I I I i I I I I I I I I ( I I I I I I I t ( I I I I I I M I I I ( I I I t I I i I I II 
orf65ng-l GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 

190 200 210 220 230 239 

orf 65-1 . pep TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYL-QMGAYADRQSAEGQRAKLAILG 
I I 1 I I I I I I I I I I I I I I I I I I M I I : :::::::: : I I I I I I I I I I I II 
orf65ng-l TPEQILNSRS IEKARSAAAKEVQKMKNFGQGGSQRIICKWARMPTVRSAEGQRAKLAILG 
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190 200 210 220 230 240 

240 250 260 270 280 290 

orf65-l pep I S S KWG YQAG HKT L Y RVQ S GNM S AD AVKKMQ D E LKKHE VAS LIRSIESKX 
M t : I i M I i I t M I I I I I I t I I I I I I I I I I I M I M I i I I I 1 I : t I : I I 
orf65nq-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 
250 260 270 280 290 

On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 46 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
391>: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs . s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

4 51 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 392; ORF103>: 

1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 



1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVL QNILYTAAN L LLLFLGLYLS 

101 GISSLAA KIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 
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201 IRLCTGLSVS LWALWKLAVL WL* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) from strain A of N. 



10 



15 



20 



25 



meningitidis: 



orf 103. pep 



orf 103a 



orf 103 .pep 
orfl03a 



orf 103 .pep 
orf 103a 



orf 103 .pep 



orfl03a 



10 20 30 40 50 60 

MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 

|| M I | I I I I I I I II I I I I I II I I I 11 I I I I 1 I I I M I I I I I f f I i I I I I I 1 I I I 
MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
10 20 30 40 50 60 

70 80 90 100 110 120 

GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

I I I I I I I I I I M I I I I I I I I 1 1 M I I I ! I I I I t I f I i I I t I i t I I M I I I M I I I I I I I 
GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

I I I t I i I I I I i I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I t I I t II I I I I I I I I I I I 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 I I I I I I II I I I 
NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 



The complete length ORF103a nucleotide sequence <SEQ ID 395> is: 



30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCANG 
CGGAACGCAC 
TCCAACTCCC 
ACAGGACGGG 
CGGACAGGTC 
TATACACGGC 
GGTATTTCTT 
GCGGAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGGTTATAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCGTTTCAC 
CGCCAACCTC 
CCTTGGCGGC 
AACCCGATAC 
TGCGGTCGGA 
CGTCGCTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACGGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTNT 
TACGGCAATC 
TCGACCAAAC 
CTGCTGCTCT 
AAAAATCGAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGNAAA 
ATCCGTATCA 



TTCCTACTCG 
ATTAAGCAGC 
GGCTGATCCT 
GGCCTGATAC 
CCGCGTCNTG 
TTTTAGGCTT 
AAAATCGGCA 
GTTACCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAATCATGCA 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAGAATATTT 
ATACTTGAGC 
AACCGATATG 
AAATCCATAC 
GTGCGGACTA 
CGGCAACGGG 
AATCTTTNGG 
AAACCGATAT 
TATGGAAACT 



This encodes a protein having amino acid sequence <SEQ ID 396>: 

45 1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVX QNILYTAAN L LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLXAIGIF SL QLXKIMQNRY 

2 01 IRLCTGLSVS LWALWKLAVL WL* 

50 ORF103a and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

orf 103a. pep MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
II I I I I I f f f I I I I I I i I II I I I ( 1 I I I I I I I f f I I I I I I I I I I I I i I I I M I I I M I 
orf 103-1 MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
55 10 20 30 40 50 60 



70 



80 



90 



100 



110 



120 
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15 



orf 103a. pep 
orfl03-l 

orfl03a.pep 
orfl03-l 

orf 103a. pep 
orfl03-l 



GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

I M t i | 1 1 | | ) I M ! I II I I I I I I I f I I I M I I I I I I H f t i i i I I I I ! I I I I I I I I I t 
GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

i | | t II ! ! I ! I ! ! I I I I I I I I! M M I II I II i f II I I 1 I II I I I I I M I I I I f I II f I I 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
130 140 150 160 170 180 

190 200 210 220 

NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
I | MINIMI I I I I I I I I I I I I I M 1 1 I I I I 1 ) 1 1 I I 1 I 1 
NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 

ORF103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from N. 



20 gonorrhoeae: 



orf 103. pep 



25 



30 



35 



orfl03ng 
orf 103 .pep 
orf 103ng 
orf 103 .pep 
orfl03ng 
orf 103 .pep 
orfl03ng 



MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 60 

I | 1 1 I | | II I 1 I I I I I \ I I I I II I I I I I I I I i I I I I I M i I I I I M I 1 I I : I I I I II 
MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 
I I : I I I M I : I : I I I I I i I I 1 I I I I I I : I 1 I I i 1) 1 1 I I I I 1 I I I 1 1 t I 1 I I 1 I I 1 I II I 
GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 180 
MIIIIMIIMIimilMIMMIIIMIMMMMmmilMMIMIMM 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 180 

NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

milium) m i m i m m m 1 1 u m m 1 1 

NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 



The complete length ORF103ng nucleotide sequence <SEQ ID 397> is: 



40 



45 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCACG 
CGGAACTCAC 
TCCAACTCCC 
ACAGGACGGA 
CGGACAACTC 
tatacacagc 
GGTATTTCTT 
GCGCAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGACTGTAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCATTTCAC 
ctccaaCCTC 
CCTTGGCGGC 
AACCCGATAC 
TGCTGTCGGA 
CATCACTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACAGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTTT 
TACGGCAATC 
TCGACCAAAc 
CTGCTGCTCT 
AAAAATCGAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGAAAA 
ATCCGTATCA 



TTCCTGCTCG 
ATTAAGCAGC 
GGCTGATTCT 
GGCCTGATGC 
ccgcgTCCTG 
TTTTAGGCTT 
AAAATCGGCA 
GCTGCCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAATCATGCA 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAAAATATTT 
ATACTTGAGC 
AACCGATATG 
AAATCCATAC 
GTGCGGACTG 
CGACAACCGG 
AATCTTTTGG 
AAACCGATAT 
TATGGAAGCT 



55 



60 



This encodes a protein having amino acid sequence <SEQ ID 398>: 

1 MNHDITFLTL FLLGFFGGTH CIG MCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRISSY TAI GLMLGLIGQL GISL DQTRVL QNILYTASN L LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSATTGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

In addition, ORF103ng and ORF103-1 show 97.3% identity in 222 aa overlap: 

10 20 30 40 50 60 

orf 103-1 . pep MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
i I I III IK Ml ill I III III II I III I ill Ml ill III M Ml Ml Mi I : I I I III 
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orfl03nq MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103-1 Pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

11:11111 I: |:lt I MUM Hi I I I : I I I I I I I I I I Nil I I II II II UN MM II 
orfl03na GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
y 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103-1 pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
j M M I M I I M M I M i I M I M 1 I I M I I I II I M II M M I M I M II M I M I M I 
orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103-1. pep NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
1 { I M M M II M M I M I I I II M M t I M I I I M M II I M 
orfl03ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 47 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 399>: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTTCTTGGTG 
GCAAACTTTG 
GACGCAGGTT 
TGTTGGTGTT 
TTGCTGCTTG 
GTCGGGTTTG 
GTATGGCATG 
TTCGGGCCGC 
GTTCCTGCCG 
TGGCGTGGGT 
GGCTCGTTCG 
GGTAACAACC 
ATTATGTGAT 



AAAGGCCGCT 
GGAACGCTGC 
GACGCTGGTG 
TGCTGGCACT 
CTCATTCAGG 
TGCTGATTGC 
TTGTGGCAGA 
TAAAGACCGG 
CCGGTTTGCT 
GGCGCGTATG 
GGTGTGTAAT 
AACAGATTCT 
TTTGCCGAAC 
ATGTATTGCG 
GCGAGGCGTT 
TTGCTCCCCG 
GCCTGAAACT 



CCTAGGCTTT 
CGAT.TCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
CTGCTGCTGC 
CCAAGGGCTG 
TTTCGCCGTT 
ATGACTGCCG 
TATGTATTTT 
C . AAGGGCGT 
GCCGTGGCGC 
GCTGTTGATT 
CGGCACACAT 
TATTGCTGCT 
GAAACATTGG 
TGTTTACCGT 
TTTGCCGCGC 



CGCTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCcGAAGC 
TCGGCGTGGC 
CATTATATTT 
TACGATGATT 
CTCAGAAAAT 
AACGATAAAT 
GTTGCTGTGT 
AAAAGCTGCT 
TATGCGGCAA 
CGGAAGTATG 
TGAATACGTT 
GAGGCTTCCA 
AATAAATACT 
CGGA. . 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCG^GGATT 
GGGCATTTCG 
CGCCGACCAC 
GTwGTCGGTG 
CGGCTTGGTT 
TCGGCGAGTT 
GCGGCAGGCA 
GTCGGCGCAA 
GTGCCGCCGT 
GACGGTACGT 
AATCGGTTAC 
AAGTCAGCGC 
TTGCTCGGGC 



This corresponds to the amino acid sequence <SEQ ED 400; ORF104>: 



1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV INTLLGHYVM PETFAAP. . . 

Further work revealed further partial DNA sequence <SEQ ID 40 1>: 



1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTT CGG 
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10 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCTTGC 
TCGGGTTTGG 
TATGGCATGG 
TCGGGCCGCA 
TTCCTGCCGT 
GGCGTGGGTT 
GCTCGTTCGG 
GTAACAACCT 
TTATGTGATG 



GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTTTGCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGCCGAACC 
TGTTTTGCGT 
CGAGGCGTTG 
TGCTCCCCGT 
CCTGAAACTT 



CAAGGGCTGC 
TTCGCCGTTT 
TGACTGCCGC 
ATGTTTTTTA 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 
AAACATTGGG 
GTTTACCGTA 
TTGCCGCGCC 



ATTATATTTC 
ACGATGATTG 
TCAGAAAATC 
ACGATAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGCAAG 
GGAAGTTTGG 
GAATACGTTA 
AGGCTTCCAA 
ATAwTwwCTT 
GGA. . . 



GCCGACCACG 
TTGTCGGTGT 
GGCTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
TGCCGCCGTG 
ACGGTACGTT 
ATCGGTTACG 
AGTCAGCGCG 
TGCTCGGGCA 



This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 



15 



20 



l 

51 
101 
151 
201 
251 



MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 
LFVLL ALGGR LPKRRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 
TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLAGLL MFF NDKFGEL 
SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 
FLPFAEPAHI GSLDGTLAWV CFAYCCLNTL IGYGSFGEAL KHWEASKVSA 



25 



30 



35 



40 



VTTLLPVFTV IXXL LGHYVM PETFAAP. 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein of H. influenzae (accession number U32769) 

ORF104 and HI0878 show 40% aa identity in 277aa overlap: 

QRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLWXXXXXXXXXXXXXXXXXXXXP- 62 
Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 
QQPLLGFT FALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

--KRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 



orfl04 


4 


HI0878 


3 


orfl04 


63 


HI0878 


63 


orfl04 


121 


HI0878 


119 


orfl04 


181 


HI0878 


179 


orfl04 


241 


HI0878 


238 



QKI 



++FND+F +GL Y GV+L G++ WV +AQKL+ 



+F QQILL++Y 



F+P A+ + + 



LA +C YCCLNTLIGYGS+ EAL 



W+ SKVS V TL+P+FT++ + + HY 



FAAP 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF104 shows 953% identity over a 277aa overlap with an ORF (ORF104a) from strain A of N. 
45 meningitidis: 

10 20 30 40 50 60 

orf 104 . pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

II I I I I I I I I I I I I I I I I I I II I : I I I I I I I I i I I I I I I I I I I II I I ! I II I I I t I I I 
orf 104a MENQR P L LG FALAL LAAMTWG T L P I AVRQVLK FV DA PT LVW VR FT VAAAVL FVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 104 .pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

III I I I I I I II I II I I I II I M I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
orf 104a LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 104 . pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 



50 



55 
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| | | | | | | | | | M I I I I I I I I I : I I I I I I M I ! I I I I M I I I I I I I I II I I I I I II I II 
orf 104a KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
130 140 150 160 170 180 

190 200 210 220 230 240 

orfl04 pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 
I M | 1 I I I I I I I I I I i I I I I I i I I I < I I I I I : I I I I I M I : I I I I I I I < t II I I I I I I I 
orfl04a SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 

250 260 270 

orf 104 . pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 
I I I I I I I I I I II I I I I I I I I I : I I I I I I I I : I I I I I 
orf 104a KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALWVGGAVTAAVG 

250 260 270 280 290 300 

The complete length ORF 104a nucleotide sequence <SEQ ID 403> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGTGC 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCTTGC 
TCGGGTTTGG 
TATGGCATGG 
TCGGGCCGCA 
TTCCTGCCGT 
GGCGTGGGTT 
GCTCGTTCGG 
GTAACAACCT 
TTATGTGATG 
ATGCCGGCGC 
GACAGGCTGT 



AAAGGCCGCT 
GGAACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTTTGCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGCCGAACT 
TGTTTTGCGT 
CGAGGCGTTG 
TGCTCCCCGT 
CCTGATACTT 
ACTGGTCGTG 
TCAAACGCCG 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
TGACTGCCGC 
ATGTTTTTTA 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 
AAACATTGGG 
GTTTACCGTA 
TTGCCGCGCC 
GTCGGGGGTG 
CTAG 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGT 
CGGCGTGGCG 
ATTATATTTC 
ACGATGATTG 
TCAGAAAATC 
ACGATAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGCAAG 
GGAAGTTTGG 
GAATACGTTA 
AGGCTTCCAA 
ATATTTTCTT 
GGATATGAAC 
CGGTTACGGC 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGTGT 
GGCTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
TGCCGCCGTG 
ACGGTACGTT 
ATCGGTTACG 
AGTCAGCGCG 
TGCTCGGGCA 
GGTTTGGGTT 
GGCGGTGGGG 



This encodes a protein having amino acid sequence <SEQ ID 404>: 



1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKWRDFSWC SF RLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAA V 

201 FLPFAELAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

^ 2 51 VTTLLFVFTV IFSL LGHYVM PDTFAAPDMN GL GYAGALW VGGAVTAAV G 

301 DRLFKRR* 

ORF 104a and ORF 104-1 show 98.2% identity in 277 aa overlap: 



10 20 30 40 50 60 

orf 104a . pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I I I I I I I I I I II I I I I I II M I I I I I I I II I I I II I I II I M I I I I M I I I I I I I I 
O r f 1 0 4 - 1 MENQRPLLGFALALLAAMTWGTLPI AVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 104a. pep LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIVVGVLVF 
Ml I I I I I I I i I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II 
orf 104-1 LPKRRDFSWCS FRLLLLGVAGI SANFVLIAQGLHYI S PTTTQVLWQI S PFTMI WGVLVF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 104a . pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 1 I I I I I I I I I I I M I I | | | M | | | 
orf 104-1 KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

190 200 210 220 230 240 
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orfl04a pep SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
orfl04a.pep | t | | 1 I t I II I I I I 1 I M 1 I I I ! f M i I i I I I i i M It i t I I M I I I M I 111 111 Ml 
nrf 104-1 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
° 190 200 210 220 230 240 

250 260 270 280 290 300 

o r f 1 0 4 a pep KHWEASKVSAVTTLLPVFTVI FS LLGH YVMP DT FAAPDMNGLGYAGALVVVGGAVTAAVG 

I I I III 11 I I I I I I I I I I ! 1 1 I I I 1 I I I I : I I I I I 
orf 104-1 KHWEAS KV S AVT T LL PVFT V I XXLLGHYVMPE T FAAP 

250 260 270 

Homology with a predicted ORF from N. gonorrhoeae 

ORF104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 
gonorrhoeae: 

orf 104 pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

IliMMIM mtimiMM :!MliMlllllMIIIMIi!!t!llilMIM 
orfl04ng MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

orf 104. pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

I | 1 ( I i I i I | I I ! 1 I I I !:! I I M I II 1 1 1 I I I 1 I I I I M I I I M M II 1 I I I I I 11 i I 
orfl04ng LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

orf 104 . pep KDRMTAAQK1GLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

: : ! I I I i I ! i I 1 I 1 I 1 : I I ! : : I : I I I I i ! M i : ! I I I 11 i I I I f M I I 1 lllllil 
orfl04ng KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 180 

orf 104 . pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 

I I i i ! It I I 1 I I I 1 I I 1 I I I 1 I llllllll:llllllll::llllllllltlllllll 
orf 104ng SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 24 0 

orf 104 .pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 

I II I I I I II I II I I I I I I I I I : I I I I I I I I : I I I I 1 
orfl04ng KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 300 

The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
protein having amino acid sequence <SEQ ID 406>: 

1 MENORPLLG F ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 

201 FLLXAEPAHI GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAA VG 

301 DRPFKRR* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ED 407>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGGACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGCAT TCATTCAGGC TGCTGCTGCT CGGCGTGACG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGCGT 

351 GTTGGTGTTT AAAGACCGGA tgaCTGCCGC GCAGAAAATC GGTTTGGTTT 

401 TGCTGCttgT CGGTttgCTT ATGTTTTtta ACGACAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCCTGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGcaag tgccgccGTG 

601 TTCCtgccgT TTGccgaaCC GGCACACATC GGAAGTTTgg aCGGTACGtt 

651 GGCGTGGGTT TGTTTTGTGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 
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851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 
901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 

1 M ENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

5 51 LFVLLA LGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFAEPAHI GSLD GTLAWV CFVYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW VGGAVTAAV G 

10 301 DRPFKRR* 

ORF104ng-l and ORF104-1 show 97.5% identity in 277 aa overlap: 

10 20 30 40 50 60 

orf 104-1. pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
i | I I i I I t M I ( I I I I I I I t I I II I t 1 t I I I I I I I I I I I 1 t I I M I I I I 1 I i I I I II I I I 
15 orfl04ng-l MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 104-1. pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
20 I II I I I I II I I I I I I I I I : I I I I I I I I I I I I I I II I I I i I I 1 1 I I I I 1 I I I I I I I M I I 

orfl04ng-l LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 



130 140 150 160 170 180 

25 orf 104-1 .pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

( i I t ( I I I I I I I II I I : I I i I I I I I I I II I I I II II I II I I I II I I I I I I I I I II II II 1 
orfl04ng-l KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

30 190 200 210 220 230 240 

orf 104-1 . pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
I I I I I I M I I I I I II I I I I I I I 1 I I I ! I I I 1 I I 1 I I I I II i I : I I I ! I I II 1 I M I I I M 
orfl04ng-l SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 

35 

250 260 270 

orf 104-1 .pep KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 
I I I I I I I I I I I I I I II I 1 I I I 11111111:11111 
orf 104ng-l KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 
40 250 260 270 280 290 300 

In addition, ORF104ng-l shows significant homology with a hypothetical H.influenzae protein: 



gi 1 1573895 (U32769) hypothetical [Haemophilus influenzae] Length = 306 
Score = 237 bits (598), Expect - 8e-62 

Identities = 114/280 (40%), Positives = 168/280 (59%), Gaps = 8/280 (2%) 

45 





Query: 


30 


QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVVIXXXXXXXXXXXXXXXXXXXXP- 


88 








Q+P M WG+LPIA++QVX ++A T+VW P 






Sbjct: 


3 


QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 


62 


50 


Query: 


89 


— KRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 


146 








K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 






Sbjct: 


63 


LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 


118 


55 


Query: 


147 


KDRMTAAQK I XXXXXXXXXXM FFN DK FGELSGLGA YAKGVLLC AAGSMAWVC YAVAQKLL 


206 






K+++ QKI +FFND+F +GL Y+ GV+L G++ WV Y +AQKL+ 






Sbjct: 


119 


KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 


178 




Query: 


207 


SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 


266 


60 






+F QQILL++Y A F+P A+ + + L LA +CF+YCCLNTLIGYGS+ EAL 




Sbjct: 


179 


LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 


237 




Query: 


267 


KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMN 306 





W+ SKVS V TL+P+FT++FS + HY P FAAP++N 
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Sbjct: 238 NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAPELN 277 

Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG. GTTTTGT 

101 T . TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

401 AATGGGTGGA ACGCGTsmmA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

451 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

701 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 

801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 

901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG... 

This corresponds to the amino acid sequence <SEQ ID 410; ORF105>: 

1 MVARRAHNPK WGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMP3V RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

301 NEILYVFDAV LP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

251 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

401 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACAT CGG CGGTCTGTTG 

751 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 



This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 
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1 WPrVRFTESV 

51 ERVKKDWEAG 

101 ECFDLTDGGG 

151 SPHKAVDPNK 

201 SQLHSLRSVS 

251 DAMLSGNMMH 



SKQDLDALFE 
CSESSDGIFL 
NPLFTLERAA 
LDNTAAGGVS 
RGVHNEILYV 
DAQLVTLDAF 



WAKASYGAES 
NADGWPDMGG 
FRPFGLLSRA 
GGEMPSEAVC 
FDAVLPETFL 
CRYGLIDAAH 



CWKTLYLNGL 
RLQHLALGWH 
VHLNGLTESD 
RESSEEAGLD 
PENQDGEVAG 
PLSEWLDGIR 



PLGNLSPEWV 
CAGLLDGWRN 
GRWHFWIGRR 
KTLLPLIRPV 
FEKMDIGGLL 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF105 shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) from strain A ofN. 



10 meningitidis: 



15 



20 



25 



3 30 



u3 35 



orf 105. pep 



orfl05a 



orf 105 .pep 



orf 105a 



orf 105. pep 



orflOSa 



orf 105 .pep 



orflOSa 



orf 105 . pep 



orf 105a 



60 70 80 90 100 110 

ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAES 

I t I I I i I I I I I f : I t I I I 1 I I I II ! I I M I 
MPTVRFTESVSKHDLDALFEWAKASYGAES 
10 20 30 

120 130 140 150 160 170 

CWKTLYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWH 

I I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I I I 1 I I I I I 1 I I I : 
CWKTLYLNGLPLGNLSPEWAERVKKDWEAGCSESSDGIFLNADGWPDMGRRLQHLARIWK 
40 50 60 70 80 90 

180 190 200 210 220 230 

CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 

I M I II I : I I II I I I I I : I II I : I M I II ! M 1 I ( i I I I I I : I I M I I I i I I I I I 
EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 
100 110 120 130 140 150 

240 250 260 270 280 290 

SPHKAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 
I | I I I I I I : I I M I I I I I I : I I : I I I : I I I I I II I I I I I I I I I I I I M I I I I I I I I II 
SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 
160 170 180 190 200 210 

300 310 
RGVHNEILYVFDAVLP 
I I II 1 I I I I I I I I I I I 

RGVHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLAAMLSGNMMHDAQLVTLDAF 
220 230 240 250 260 270 



40 The complete length ORF105a nucleotide sequence <SEQ ID 413> is: 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGCCGACCG 
CCTATTCGAG 
CGCTGTATCT 
GAGCGCGTCA 
CATTTTCCTG 
ACCTCGCCCG 
GAGTGTTTCG 
ACGCGCCGCT 
ACGGTTTGGT 
AGTCCGCACA 
CGGTGTTTCC 
GCGAAGAAGC 
TCGCAGCTGC 
CCTGTATGTA 
AGGATGGCGA 
GCTGCCATGT 
GGACGCGTTT 
AGTGGCTGGA 



TCCGTTTTAC 
TGGGCAAAGG 
GAACGGTCTG 
AAAAAGACTG 
AATGCGGACG 
AATATGGAAA 
ACCTGACCGA 
TTCCGTCCGT 
CGAATCGGAC 
AAGCAGTCGA 
AGCGGTGAAT 
CGGTTTGGAT 
ACAGCCTGCG 
TTCGATGCCG 
AGTGGCGGGT 
TGTCGGGAAA 
TGCCGTTACG 
CGGCATACGT 



CGAATCCGTC 
CAAGTTACGG 
CCTTTGGGCA 
GGAGGCAGGC 
GCTGGCCAGA 
GAAGCGGGAC 
CGGCGGCAGC 
TCGGACTGCT 
GGCCGATGGC 
TCCCGACAAA 
TGCCGTCTGA 
AAAACGCTGC 
CCCCGTCAGC 
TCCTGCCCGA 
TTTGAGAAAA 
CATGATGCAC 
GTCTGATTGA 
T TAT AG 



AGCAAACACG 
TGCGGAAAGT 
ATCTGTCGCC 
TGCTCGGAGT 
TATGGGCAGA 
TGCTTCACGG 
AATCCCTTGT 
CAGCCGCGCC 
ATTTCTGGAT 
CTCGACAATA 
AACCGTGTGT 
TTCCGCTCAT 
CGGGGTGTGC 
AACCTTCCTG 
TGGAC AT CGG 
GACGCGCAAC 
TGCCGCCCAT 



ACCTTGATGC 
TGCTGGAAAA 
GGAATGGGCG 
CTTCAGACGG 
CGCTTGCAGC 
CTGGCGCGAC 
TCGCGCTCGA 
GTCCATCTCA 
AGGCAGGCGC 
CTGCCGCCGG 
CGCGAAAGCA 
CCGCCCGGTA 
ACAATGAAAT 
CCTGAAAATC 
CGGTCTGTTG 
TGGTTACGCT 
CCGCTGTCCG 



This encodes a protein having amino acid sequence <SEQ ID 414>: 
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1 MPTVRFTESV 

51 ERVKKDWEAG 

101 ECFDLTDGGS 

151 SPHKAVDPDK 

201 SQLHSLRPVS 

251 AAMLSGNMMH 



SKHDLDALFE 
CSESSDGIFL 
NPLFALERAA 
LDNTAAGGVS 
RGVHNEILYV 
DAQLVTLDAF 



WAKASYGAES 
NADGWPDMGR 
FRPFGLLSRA 
SGELPSETVC 
FDAVLPETFL 
CRYGLIDAAH 



CWKTLYLNGL 
RLQHLARIWK 
VHLNGLVESD 
RESSEEAGLD 
PENQDGEVAG 
PLSEWLDGIR 



PLGNLSPEWA 
EAGLLHGWRD 
GRWHFWIGRR 
KTLLPLIRPV 
FEKMDIGGLL 
L* 



ORF105a and ORF105-1 show 93.8% identity in 291 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

orflOSa pep MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 
| f | | I | | I I f| I : t 1 i I I I I I I i I I M I I I I I f I I I I I I I I t I I I I t ! 1 : t I I I I I I I 1 ! 
orf 105-1 MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 105a . pep CSESSDGIFLNADGWPDMGRRLQHLARIWKEAGLLHGWRDECFDLTDGGSNPLFALERAA 
I i | | | I I t I I I I I I I I t I I 1 1 I i I I I : MM M !: II I I I I I I I : I M I : I I I I I 
orf 105-1 CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 105a . pep FRPFGLLSRAVHLNGLVESDGRWHFWIGRRSPHKAVDPDKLDNTAAGGVS SGELPSETVC 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I : I I I : I I 
orf 105-1 FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 105a. pep RES SEEAGLDKT LLPLIRPVSQLHS LRPVSRGVHNE I LYVFDAVLPET FLPENQDGEVAG 
I i I M II I I M M I I M i I I f I I I I M I ! I I I i I M M f I M ! i I M M II I 1 I f I I M 
orf 105-1 RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

190 200 210 220 230 240 

250 260 270 280 290 

orf 105a. pep FEKMDIGGLLAAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
I I I M I It I I M I M I M I I I I I I I I M I I I II II I I I II M I M I M I It 
orf 105-1 FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 

250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF105 shows 87.5% identity over a 312aa overlap with a predicted ORF (ORF105.ng) from K 
gonorrhoeae: 

orf 105 . pep MVARRAHNPKWGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRIFLPAAISER 60 

i M I I I I It I I II I I I III : I I II I II I i I I I I I II I I I I II I I I t I 1 I I I 

orflOSng MVARRAHNPKVVGSNPAPATKYQTPRFNAEGVLF FLFPAASVFCRIFLPAAISER 55 

orf 105 . pep QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 120 

I : I M I M I M M I I I I M I I I II : I I I I M II II M I M I I 1 I I I I I I I II M I I II 
orflOSng QAAVCLRLQIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKASYGAESCWKT 115 

orf 105. pep LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 

MM I I i M I I II : I I : M I II M I II : I M M I M II M M I 1 II II t : III 
orflOSng LYLNRLPLGNLSPEWAERIKKDWEAGCSESSNGIFLNADGWPDMGGRLQHLARTWNECAGL 175 

or f 1 05 . pep LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 24 0 

I I M M M II I I I II I I I I I I II I II III I I II I I I I : I I : I I M II M II M II 
orf 105ng LHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHLNGLVESNGRWHFWIGRRSPHK 235 

orf 105 . pep AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 

I II I : II M : I I M I M I I M II II M I I I M I M M : I I I I I II : I M M I II M I 
orflOSng AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 295 

orf 105. pep NE I L YVFDAVL P 312 
I I I I I II I I M I 

orflOSng NEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYG 355 
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A complete length ORF105ng nucleotide sequence <SEQ ID 415> was predicted to encode a 
protein having amino acid sequence <SEQ ED 416>: 

1 MVARRAHNPK WGSNPAPAT KYQTPRFNAE G VLFFLFFAA SVFCRIFL PA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NKAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

251 SGGEMPSEAV CRESSEEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

351 FYRYGLIDAA HPLSEWLDGI RL* 

Further work revealed the complete nucleotide sequence <SEQ ID 41 7>: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 CCTGTTCGAG CGGGCAAAAG CAAGTTACGG TGCCGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACCGTCTT CCTTTGGGCA ATCTGTCGCC GGAATGGGCT 

151 GAGCGCATCA AAAAAGACTG GGAGGCAGGC TGCTCCGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCGGA TATGGGCGGA CGCTTGCAGC 

251 ACCTCGCCCG CACATGGAAC AAGGCGGGGC TGCTTCACGG ATGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTACT CAGCCGCGCC GTCCATCTCA 

401 ACGGTTTGGT CGAATCGAAC GGCAGATGGC ATTTTTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGa tcCCGGCAAG CTCGACAATA TTGCCGGCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGC CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGT TTCCGCTCAT CCGCCCAGTA 

601 TCGCGGCTGC ACAGCCTTCG CCCCGTCAGC CGAGGTGTGC ACAATGAAAT 

651 CCTGTATGTG TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA GGTAGCGGGT TTTGAAAAGA TGGACATTGG CGGCCTATTG 

7 51 GATGCCATGT TGTCGAAAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TACCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l>: 



1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L* 

ORG105ng-l and ORF105-1 show 93.5% identity in 291 aa overlap: 



10 20 30 40 50 60 

orf 105-1 . pep MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
I I I I I I I 1 ! I 1 I I I I I I 1 I ) M I I I I I I I I I I ! I I I I I I I I M I I I I : I I : I I I I I I I 
orfl05ng-l MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 105-1 .pep CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
I I I! 1 I I t I I i ! t I ! i i i I I I I I i I I I : till I i I I I I I I i I i t I I I I I I I I t I I I 
orf 105ng-l CSESSDGIFLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTDGGGNPLFTLERAA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 105-1 . pep FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
I I I I I I I I I I I I t I I I : I I : I I I I 1 I I 1 I I I I I I I I 1 I : M I I I : I I 1 I I I M I I I I I I 
orf 105ng-l FRPFGLLSRAVHLNGLVESNGRWHFWIGRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVC 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 105-1 . pep RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
I I 1 I 1 I I I I I M I : I I I I ! I I : I ! I I I I I I II I I I I I I I I I I I I I ! I I I I I I ! I I I I I I 
orf 105ng-l RESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

190 200 210 220 . 230 240 
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250 260 270 280 290 

orf 105-1 pep FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
| | I | t 1 II I t I I 1! I I I I I I I I I I I I I I I I I M I I 1 I I I I I ! I f I f I I ! I 
orfl05ng-l FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLIDAAHPLSEWLDGIRLX 
5 250 260 270 280 290 

Furthermore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P41888|TNR3_SCHPO THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
>gi 1 1076928 (pir I I S52350 thiamin pyrophosphokinase (EC 2.7.6.2) - fission yeast 
(Schizosaccharomyces pombe) >gi| 666111 (X84417) thiamin pyrophosphokinase 
10 [Schizosaccharomyces pombe] >gi 1 2330852 i gnl j PID I e334056 (298533) thiamin 

pyrophosphokinase [Schizosaccharomyces pombe] Length = 569 
Score = 105 bits (259), Expect = 4e-22 

Identities = 64/192 (33%), Positives = 94/192 (48%), Gaps = 3/192 (1%) 

15 Query: 268 NKAGLLRGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW--HFWI 4 41 

N G+ WRNE + + P+ +ER F FG LS VH + + W+ 

Sbjct: 96 NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 155 

Query: 4 42 GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLR 621 
20 RRSP K P LDN GG++ G+ + +E SEEA LD + LI P + ++ 

Sbjct: 156 PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 214 

Query: 622 PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 7 98 
R + E+ YVFD + + +P DGEVAGF + + +L + K+ + LV 
25 Sbjct: 215 MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 274 

Query: 799 LDAFYRYGLIDAAHP 84 3 

LD R+G+I HP 
Sbjct: 275 LDFLIRHGIITPQHP 289 

30 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



45 



Example 49 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 



35 41 9>: 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGAATAGAC 
CCAAACCAGC 
CCCTATGGAC 
TTGATATTTG 
ACCTGCATCG 
CAGCGAAATT 
TTTGCGCTTT 
GTTGAAAACG 
GTCGTCTGAA 
GTCGAACGTT 
TCAGAAAAGG 
TCCTATCCGC 



CCAAGCAACC 
CTGACGGGTA 
GACATTTGCA 
GTAACTATAC 
GGCGTAATCA 
CGTGGAAGAT 
CGACCTCACG 
GAGGCAGTTT 
GCTGATACAC 
TGGAAAACCA 
CGCATTAGAC 
. CAATGA 



CTTCTTCCGT 
AAGTGATTCT 
TCGATATCTG 
GCGAAAGACA 
GGGTGTATGC 
GGmsAAAAGG 
TTTCGGCGCA 
TGAAGAAAAC 
GGGAATGAAA 
GGAACTCCAT 
TTGCGGAAGA 



CCCGAAGTCG 
GACACGACCG 
CGTTATTGAT 
ACAGTGGAGG 
ACCGgATACG 
TTAAGGCTGG 
GGAGGTAGCG 
GTTGGCAGAA 
CGCGCAgCcT 
ATTTCGCAAC 
AATGTTGCAG 



CCGTTGCCCG 
TTGTCATTTT 
TATCCTGTTT 
GACAAATTTT 
rGkACAATTA 
CGACAAGCTA 
TGCAGCAGCA 
CAGGAACTGG 
TAAAGCAACT 
AGATAGACGG 
AAATATCGTT 



This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 



50 



1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF107 shows 97.8% identity over a 186aa overlap with an ORF (ORF107a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 107 . pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
I I M I I II I I M I I ! I I M I I I I I I I I I I I 1 M I ! I ! 11 I I II I I ! I 1 I I 1 I I M 1 I I i I 
orf 107a mnrpkqpffrpevavarqtsltgkviltrplsfslwttfasisalliilflifgnytrkt 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 107 .pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 
I | i i i M I I i I I I I I I I I i I Mill! Ill I I I I I I ( I I I I i I t I I I I I I I I I I I I I 
orf 107a TVEGQILPASGVIRVYAPDTGTITAKFXEDGEKVKAGDKLFALSTSRFGAGDSVQQQLKT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 107 . pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 
I I I M II I I I I I M I I I I I I I I II II I I M II I M I I I I I I I I I 11 I I I II 1 I II I I I 1 I 
orf 107a EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 

130 140 150 160 170 180 



189 

orf 107 .pep KYRFLSXQX 
I I 1 I I I 

orf 107a KYRFLSANDAVPKQEMMNVKAELLEQPCAKLDAYRREEVGLLQEIRTQNLTLXSLPQAAX 
190 200 210 220 230 

The complete length ORF107a nucleotide sequence <SEQ ID 421> is: 



1 ATGAATAGAC CCAAGCAACC NTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGGATACG GGGACAATTA 

251 CNGCGAAATT CNTGGAAGAT GGAGAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGATAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

4 01 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAGCCT TAAAGCAACT 

4 51 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC CAATGATGCA GTGCCAAAAC AAGAAATGAT GAATGTCAAG 

601 GCAGAGCTTT TAGAGCAGAA AGCCAAACTT GATGCCTACC GCCGAGAAGA 

651 AGTCGGGCTG CTTCAGGAAA TCCGCACGCA GAATCTGACA TTGGNNAGCC 

701 TCCCCCAAGC GGCATGA 

This encodes a protein having amino acid sequence <SEQ ID 422>: 



1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

101 FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 



Homology with a predicted ORF from N. gonorrhoeae 

ORF107 shows 95.7% identity over a 188aa overlap with a predicted ORF (ORF107.ng) from N. 
gonorrhoeae: 



orf 107 .pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

i 1 f I I I I i I II I I I : I I I I I 11 1 1 1 f 1 I I I I 1 I 1 1 I I I II II I I I II I I I I 1 1 1 1 1 I I 1 1 
orfl07ng MNRPKQPFFRPEVAIARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

orf 107 .pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 
I : M I I I I I I I II I I I I I I I II I I | M I I I I I I I I I I I I I I I I I I I I I | I | I I | | | M 
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orfl07ng TM: 



IEGQILPASGVIRVYAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 



120 



orfl07 pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 180 
' P ^ lilitiilliflllMIIM llllltllllllllll:l!HHimMMMIIIIl: 
EAVLKKTLAEQELGRLKLIHENETRSLKATVERLENQKLHISQQIDGQKRRIRLAEEMLR 



180 



orf 107ng 

orfl07.pep KYRFLSXQ 188 

HUM I 
orfl07ng KYRFLSAQ 188 

The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SI5ALLIILF 

51 LIFG NYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR KYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 50 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
425>: 



1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT.TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

4 01 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 



1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 



1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 CTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 
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1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

5 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N [gonorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORF108.ng) from AT. 
gonorrhoeae: 

orf 108 . pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAI AGLDLGQSSE 60 
10 II: I I II I I I I I I I I I I 1 I 1 I 1 I 1 : I I 1 I 1 I I I I I I I I I M I I I I I 1 I I II 

orfl08ng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKFVFKVKYIDNTAIAGLALGQSSE 60 

orf 108 . pep GKTNDGKKQI SYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 
I 1 I I I I I I II I I M I I I I 1 I I :: I I 1111:1111! I I I I 11 I I I : I : I I I I I I I I I 1 
15 orfl08ng GKTNDGKKQI SYPIKGLPEQNAVRLTGKHPNDLEAVVGKCMETDGKDAPSGWAENGVCHT 120 



20 



25 



orf 108 . pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

I I I I I I I I 1 I I I II M I I I I : ! I : I I I I I I I II I I I I I I I I I I I I ! I II I I I I I I I I I I I I 
orfl08ng L FAKLVGN I AEDGGKLT D YL I SHSALQPYQAGKSG YAAVQNGRYVLE I DSEGAFYFRRRHY 181 

ORF108-1 shows 92.3% identity with ORF108ng over the same 181 aa overlap: 

orf 108-1 .pep MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAI AGLDLGQSSE 60 

lit I I I I i I I I I I I ( I I t I I I I I I I I I I 1 : I I I I I I I I M M I I I I I M I I (Mill 
orfl08ng-l MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orf 108-1 .pep GKTN DGKKQ I S Y P I KGL PEQNV I RL IGKH P GDLE AV SGKCMET D DKD S P AGWAENGVCHT 120 

M I M II I M I I I I II I I I 11 :: I I i I M : I I I I I I I I I I I i I I : I : M I i I I I II I 
orfl08ng~l GKTN DGKKQI S Y P I KGL PEQNAVRLTGKH PN DLEAWGKCMET DGKDAPS GWAENGVCHT 120 



30 orf 108-1 .pep L FAKLVGN I AE DGGKLT DYLVS HAALQP YQAGKS G YAAVQNGRYVLE I D SEGAF Y FRRRH Y 181 

I ! I I I I I II II I I I I I I I I I : I I : M I I M M M i I I I I I t I I I I I I I I I ! I II I I I I I I I 
orfl08ng-l L FAKLVGN I AE DGGKLT DYL I SH SALQP YQAGKS G YAAVQNGRYVLE I D S EG AF Y FRRRH Y 181 

The complete length ORF108ng nucleotide sequence <SEQ ID 429> is: 

1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

35 51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

201 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA Aacgccgtcc 

251 gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 

40 301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

45 This encodes a protein having amino acid sequence <SEQ ID 430>: 

1 MLKIPFA VLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYIDNTAI 

51 AGLAL GQSSE GKT NDGKKQI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

50 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underlined) and a putative ATP/GTP-binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
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N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 51 



The following DNA sequence was identified in N. meningitidis <SEQ ID 43 1>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAAGATT 
CGgATTTATC 
CACTCTTGTT 
CTGCAAgCAG 
AGGTTTGATT 
TAGGCGGCGT 
CTgCTgGCGG 
GTTTTCGCCC 
TTTTTCTGTT 
TGTGTTCGGA 
TGCTCGGCTG 
GTTGCCTGCA 
TATTTTCCCG 
ATTTAgGTGC 



TATATATAAT 
GATgcgatTg 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGcCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
cGGGCTGACG 
CCGGGTGTCG 
CAAgCTGTTG 
ATCTTGGTTC 
ATTGCGGCAA 
GAGATTTGCC 



ACTCGCTTTG 
cGggCGGGGG 
CCTCCCGTGT 
GTTTTCAGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTCGC.ACCG 
GCTCGTTTTT 
AACGCGATGT 
GCTATCGGTA 
CGaTGGCGGT 
GTaCgctTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCCGCA 
GCTTGGTTTC 
TTTGTCGCAC 
AGGCAAAGCC 
CTTTTGGGTT 
TCTGATTGCC 
CTTACACCAA 
TTCCTGCTGC 
CGGTGCGTTT 
GTTCGAAGCT 



CGATGATTGC 
ACGCTGCCCG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTTACGACGG 
TTTATTGTTT 
ATTGGCGAAC 
ACGGTTCGAT 
GTCGGtGCGA 
GATTAA 



This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 



20 



1 MEDLYIILAL GLVAMIAGFI 

51 LQAAAATFSA TVSFARKGLI 

101 LLAWPVLLI FVALYFVFSP 

151 CVRTGCRLVF SDCLYCFARL 

201 YFPDCGNDGG RCVCRCEFRC 



DAIAGGGGLI TLPALLLAGI PPVSAIATNK 
DWKKGLPIAA ASFVGGVAGA LSVSLVSKDI 
KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 
QAVERDVLHQ IGERCLQSWF AIGIPAARFD 
EICRTLRFEA D* 



25 Further work revealed the following DNA sequence <SEQ ID 433>: 



30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
TAGGCGGCGT 
CTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTGTT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCCTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATATATAAT 
GATGCGATTG 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGCCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGGCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGCGGCAAC 
AGATTTGCCG 
CATCAGCATT 
TGTATCAGAT 



ACTCGCTTTG 
CGGGCGGGGG 
CCTCCCGTGT 
GTTTTCAGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTCGCACCGC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCCGCA 
GCTTGGTTTC 
TTTGTCGCAC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATTGC 
ACGCTGCCCG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 



45 



l 

51 
101 
151 
201 
251 



MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFVGGVAGA LSVSLV SKDI 

LLAWPVLLI FVALYFV FSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

IFPIAATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 
RNPLYQMIVS MF* 



50 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF109 shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) from strain A of K 
meningitidis: 
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10 20 30 40 50 60 

orfl09 pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
| || M | | | I ( | | I i I t I t I I I I I ! I M I I I I M 1 I I t I I I I t II t I I I I I I I II I M t I I 

orfl09a medlyiilalglvamiagfidaiagggglitlpalllagippvsaiatnklqaaaatfsa 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl09 pep TVS FARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 
| 1 M I I I I I M i ( M I I f I I I! I: I I i : I I I I i I I I M H I I I I I I I I M I II I II I ! II 
orfl09a TVSFARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
70 80 90 100 110 120 

130 140 150 160 170 180 

orfl09.pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

M t i i I I i I I i I I I I I I I I M : I I 
orfl09a KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
130 140 150 160 170 180 

The complete length ORF109a nucleotide sequence <SEQ ID 43 S> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
CAGGCGGCGT 
CTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTGTT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCCTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATACATAAT 
GATGCGATTG 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGTCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGTCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGCGGCAAC 
AGATTTGCCG 
CAT C AG CAT T 
TGTATCAGAT 



ACTCGCTTTG 
CGGGTGGGGG 
CCTCCCGTGT 
GTTTTCGGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTTGCACCAC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCGGCA 
GCTTGGTTTC 
TTTGTCGCGC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATTGC 
ACGCTGCCTG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



This encodes a protein having amino acid sequence <SEQ ID 436>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGVVGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGAN LGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109a and ORF109-1 show 99.2% identity in 262 aa overlap: 



10 20 30 40 50 60 

orf 109a. pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
I I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I 
orf 109-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 109a. pep TVS FARKG L I DWKKGL P I AAASFAGGVVGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 
I I I I I I I I I I I I I I II I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 109-1 TVS FARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 109a. pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II 
orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 



orf 109a . pep 



190 200 210 220 230 240 

LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
I M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
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orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
° r 190 200 210 220 230 240 

250 260 
orf 10 9a. pep SMAVKLLIDERNPLYQMIVSMFX 
I I I M I II I I M I I I H I t I I M 
orfl09-l SMAVKLL I DERN PL YQMI VSMFX 

250 260 

Homology with a predicted ORF from N. gonorrhoeae 

ORF 109 shows 98.3% identity over a 23 laa overlap with a predicted ORF (ORF109.ng) from N. 



gonorrhoeae: 

orf 109. pep 
orf 109ng 



MEDLYI I LALGLVAMIAGFI DAIAGGGGLITLPALLLAGI PPVSAIATNKLQAAAATFSA 60 
I I I I I I I I I 1 I I I I II I I I M I I I I I I M I I ! I I I M I I I I I I I I I I I M I I U I I! I I 1 
MEDLYI I LALGLVAMIAGFI DAIAGGGGLITLPALLLAGI PPVSAIATNKLQAAAATFSA 60 



orf 109. pep TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 

I I I I M M M I I M I i I I t I I i I : I I I : i I I I i t t i I I I I i I M I M I I I I I I I I I I I M 
orfl09ng TVSFARKGLIDWKKGLPIAAASFAGGVVGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 



orf 109 .pep 
orfl09ng 
orf 109 .pep 
orf 109ng 



KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

I I I M I I I M I I M I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I II I I 1 I I I I I I 
KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 

I I I M I I I I I I I I I M I I I I II I I I I M I ! I M II I I I I I 1 I I I I I I I I I 
IGERCLQSWFAIG I PAARFDYFPDCGNDGGRCVCRCE FRCE ICRPLRFEAD 231 



180 



180 



An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 
acid sequence <SEQ ID 43 8>: 

1 MEDLYI I LAL GLVAMIAGFI DAIAGGGGLI T LPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAVVPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VATAFGFL RR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 439>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
CAGGCGGCGT 
TTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTATT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCTTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATACATAAT 
GATGCGATTG 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGTCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGGCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGTGGCAAC 
AGATTTGCCG 
CAT C AG CAT T 
TGTATCAGAT 



ACTCGCTTTG 
CGGGCGGGGG 
CCTCCCGTGT 
GTTTTCGGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTTGCACCGC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCCGCA 
GCTTGGTTTC 
TTTGTCGCGC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATCGC 
ACGCTGCCTG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



This corresponds to the amino acid sequence <SEQ ID 440; ORF109ng-l>: 



1 MEDLYI I LAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAVVPVLLI FVALYFV FSP KLDGSKEGKA RMSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIVATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLL ID E 

251 RNPLYQMIVS MF* 
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ORF109ng-l and ORF109-1 show 98.9% identity in 262 aa overlap: 

10 20 30 40 50 60 

orf 109nq-l pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
| | | | I I M I I I I I I I I I I I I II II I 11 1 I I I I I I I II I I I I 1 I 1 I I I I i I M I I I I M I I 
orf 109-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl09ng-l.pep TVSFARKGLIDWKKGLFIAAASFAGGWGALSVSLVSKDILLAVVPVLLIFVALYFVFSP 
I | | I II I I M M I I I I I I I I I I I : I I I : I I I I I I I M I I I I I I I I 1 I I I I I I I I I I I I I I 
orf 10 9-1 TV S FARKGL I DWKKGL P I AAAS FVGGVAGAL S V SLV SKD I LLAVVP VLL I FVAL Y FV FS P 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfl09ng-l.pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
I I II I I I I I I I II I II I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I 
orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfl09ng-l.pep LANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
I II I I I I I I I I I I I I I I I I I I I I I : II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

190 200 210 220 230 240 

250 260 
orf 109ng-l .pep SMAVKLLIDERNPLYQMIVSMFX 
I II I I I I I I I I I M I I I I I I II I 
orfl09-l SMAVKLLI DERM PLYQMI VSMFX 

250 260 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

sp|P29942|YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3 ' REGION (ORF9) 
>gi I 94984 Ipiri 1 138164 hypothetical protein 9 - Pseudomonas sp >gi|551929 
(M62866) ORF9 [Pseudomonas denitrif icans] Length = 261 
Score = 175 bits (439), Expect = 3e-43 

Identities = 83/214 (38%), Positives - 131/214 (60%), Gaps - 1/214 (0%) 

Query : 4 1 PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 

PP+ -f TNKLQ R+G + + K+ LP+ D+ 

Sbjct : 4 3 PPLQTLGTNKLQGLFGSGSATLSYARRGHVNLKEQLPMALMSAAGAVLGALLATIVPGDV 102 

Query: 101 LLAWPVLLIFVALYFVFSPKLDGSKEGKARMSFFL FGLTVAPLLGFYDGVFGPGVGSFF 160 

L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 
Sbjct: 103 LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 161 

Query: 161 LIAFIVLLGCKLLNAMSYTKLANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGA 220 

++ F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 
Sbjct: 162 MLG FVTLAG FG VLKAT AHTK FLN FG S N VGAFG V FL FFG AVLWKVGLLMGLGQFLGAQVG S 221 

Query: 221 RFAVRFGSKLIKPLLIVISISMAVKLLIDERNPL 254 

R+A+ G+K+IKPLL+++SI++A++LL D +PL 
Sbjct: 222 RYAMAKGAKI IKPLLVIVSIALAIRLLADPTHPL 255 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 52 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 441>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



. CTGCTAGGGI 
CCGCAGACGG 
TGGTTTTCTG 
TCATGATGTT 
CCGCCGTTCT 
ATCTCTGGCG 
CCGAGGTTGC 
ATTAACCGTG 
GAACAAATGG 
TGGGCGGGTT 
GGTCGGATTg 
CCGAAAGTAT 
TATTTCCG . A 



ATTGCATCGG 
ATTATTTGGT 
GGACTGTATG 
TTTGGTGGTT 
GGCGCGAAAT 
GCGATGCGCC 
CAAACGTTAT 
AAGACGGGTC 
GGCTATATCT 
GATAGACAGT 
TTCCGGACAA 
. TTTGGGTGC 
GGGGCAGAgT 



TTATCGGTAC 
CAAATTCGGA 
ACGTCTATGC 
TCTACCAGTT 
GAAGTCTTTT 
ATTCTTCGCT 
CTGGAAGTAC 
GGTTCTGATT 
TTGCCCATGT 
AACCTGCTGT 
TCAGGCGGTT 
gTCCAATCTC 
GCGGATGTGG 



GgCTGTTGCA 
TCGTTTTGGG 
TTCGGCATGG 
TGTGCCTGAT 
CGGGAAAAGG 
GTTGGATGTA 
AAGGTTTTCA 
GCCGCCAAAA 
TGCTTTGATT 
TGAAACTGGG 
TATGCCAAGG 
TCATTTAGGG 
TTTTCCTGA 



GCAAAACCAG 
CGAG . ATTTT 
TTTGTCGTTA 
TCGCAATGTG 
TTAAAGAAAA 
AAAATTGCGC 
GGGGAAAACC 
AAGGCACAAT 
GTCATTTGCC 
TATGCTGACC 
ATTTC.AAGC 
GCAACGTCAA 



This corresponds to the amino acid sequence <SEQ ID 442; ORF1 10>: 

1 . . LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with ORF88a from N. meningitidis (strain A) 

ORF1 10 shows 91 .5% identity over a 1 88aa overlap with ORF88a from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 88a . pep MSKSRRSPPLLSRPWFAFFSSMRFA VALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 

I I I 1 I I I I I I :! I I M I I I I II I M ! M 1 I 
o r f 1 1 0 LLGIASVIGTLLQQNQ PQT D Y L VK FG S FW A 

10 20 30 

70 80 90 100 110 120 

orf 88a. pep QIFGFLGLYDVYASAW FVVIMMFLVVSTSLCLI RNVPPFWREMKSFREKVKEK5LAAMRH 
I I ! I I I I I I I I I t I I I I I II I II I I I I t I I I M I I I I II I I I I t I I I I I I I I I I I I t I I 
orfllO XI FG FLGLY D V Y AS AW FVV I MM FL VVS T S LC L I RN V P P FWREMK S FREKVKEKS LAAMRH 

40 50 60 70 80 90 



130 140 150 160 170 180 

orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 
I I I I I II I ! I I M I I I I II I I I f 1 I I I M I I I i I I I i I I I I i 1 I ( I I i M N I I i I 1 I II 
orf 110 SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 

100 110 120 130 140 150 

190 200 210 220 230 240 

orf 88a . pep GGLI DSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADVVF 

I II M I II I I I I I I I II I I : : : I I I I : I 
orf 110 GGLI DSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 
160 170 180 190 200 210 

250 260 270 280 290 300 

orf 88a. pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 

orfllO SX 

However, ORF88 and ORF1 10 do not align, because they represent two different fragments of the 



same protein. 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF1 10 shows 88.6% identity over a 21 laa overlap with a predicted ORF (ORF1 lO.ng) from N. 
gonorrhoeae: 

orfllO pep LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 30 

I I I I I II I I I : I I 1 I I II I M I 1 I 1 I I I : 
orfllOng MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orf 110 . pep XIFGFLGLYDVYASAWFVVIMMFLVVSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 90 

II | | I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I ! I I 1 1 I i I I I i I I M I I i I I I I M i I 
orfllOng RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf 110 pep S S LLD VK I A PE VAKRYLE VQG FQGKT I NRE DG S VL I AAKKGTMNKWG Y I FAHVAL I V I C L 150 

I N | M I I I I I I j I I I (I I : I I I I I I : : I i I I II I I I I I I I I I I I II i I I I I I I I I I I I 
orfllOng SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 180 

orf 110. pep GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 

I II: 111111111:1 III: I I I I I I I I I I : I I I I I I MUM II : II II I 
orfllOng GRL I NXN LLLKLGMLAG S I FRNNRRVMPR I SK PE S I WGGVQS L I KGQRQ Y FQRGKVRMWF 24 0 

orfllO.pep S 211 

I 

orfllOng S 241 

The complete length ORFllOng nucleotide sequence <SEQ ID 443> is predicted to encode a 
protein having amino acid sequence <SEQ ID 444>: 

1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FVVI MMFLVVSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX AHVALIVICL GRLINXN LLL KLGMLAGSIF 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 53 

The following DNA sequence was identified in N. meningitidis <SEQ ED 445>: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

701 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

7 51 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 
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1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 
1051 CGCTAA 

This corresponds to the amino acid sequence <SEQ ID 446; ORF1 1 1>: 

1 MPSETRLPNF IRVLIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDKLPSP AEIQKRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 
151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 
201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 
251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 
301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 
351 R* 

Computer analysis of this amino acid sequence gave the following results; 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 1 1 shows 96.9% identity over a 35 laa overlap with an ORF (ORF1 1 la) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 
1 I 1 M I I ! M 1 1 : M I I 1 : I I U i i I I I i I ! I i I I I I I I 1 I I i I i I t I I II I I i ! MM 
MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
10 20 30 40 50 60 

70 80 90 100 110 120 

AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 
MM M M I M M M M M M M M M I M M M M M M M M M M M M : M M M 
AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
70 80 90 100 110 120 

130 140 150 160 170 180 

GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
M M M M I M M M M M M M M M I M M M M M M M M M M M M M M M M 
GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
130 140 150 160 170 180 

190 200 210 220 230 240 

AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 
M I M I I M I I M i I M M M M I I M M M M M M M M I M M I M 1 M M I I M 
A YLDLS S I AKG FG VDKVAGELEK YG I QN YLVE I GGE LHGKGKNARGE P WR I G I E Q PN I VQ 
190 200 210 220 230 240 

250 260 270 280 290 300 

GGNTQIIVPLNNRSXATSGDYRIFHVDKSGKRLSHIINPNNKRPISHNLASISVXADSAM 
M M M I M M M I M M M M M M M M M M M M M M M M M M M I M M I 
GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISVVADSAM 
250 260 270 280 290 300 

310 320 330 340 350 

TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
MM M M M I M M M M M M M I M M M M M M M M M M M M ! 
TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
310 320 330 340 350 

The complete length ORF1 11a nucleotide sequence <SEQ ID 447> is: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCACCT TGATATTTGC 

51 CCTGAGTTTT ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGTGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACNAACT CCCNTCACCT GCCGAAATAC AAAANCGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC ACCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

401 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 



orf 111a .pep 
orflll 

orf 111a. pep 
orflll 

orf 111a. pep 
orflll 

orf 111a. pep 
orflll 

orf 111a. pep 
orflll 

orf 111a .pep 
orflll 
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4 51 ATCAAACAAG CAGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATNANGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGNGAGTT 

651 GCACGGCAAA GNCAAAAACG CGCGCGGCGA ACCTTGGCGC ATCGGCATCG 

7 01 AACAGCCCAA CATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGNTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAGCGGC AAACGCCTCT CCCATATCAT TAATCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGNTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTNTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 448>: 



1 MPSETRLPNF IRTLIFALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVHLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDXVAGE 

201 LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA SISVXADSAM 

301 TADGXSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 



Homology with a predicted ORF from N.gonorrhoeae 

ORF1 1 1 shows 96.6% identity over a 35 laa overlap with a predicted ORF (ORF1 1 1 .ng) from N. 
gonorrhoeae: 

10 20 30 40 50 60 

orflllng MPSETRLPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
I ( ( t I I i I I : I i : i I I I I I I 1 I I I I 1 I I I t t t 1 I I I I M I I I I I I I I It I I II I I I I M I 
orflll MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

10 20 30 40 50 60 



70 80 90 100 110 120 

orflll AKIQKRIDDALKEVNRQMSTYQTDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
I : I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I 1 I 1 I I I I I II II I I I I I ) I I 
orflll AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orflllng GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPK 
I I I I I I I I I I I I I ! I I I I I I I I I I I I M I I I I M ! I I II I I I I I I : I I I I I I I I I I I M I 
orflll GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orflllng AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 
I I I I II I I I M I I I 1 II I I 1 II I I I I I I I I M I I I I I I I I I I ! I : I I I I 1 I I I I I I I I : I 
orflll AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orflllng GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 
I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I M M I I ! I I I I I I I II I I I I : I I I I 
orflll GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 

250 260 270 280 290 300 



310 320 330 340 350 

orflllng TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 
I I i II I I I I I I i I I I I I I : I I I : I I I II I I I I I I I I I I I I I i I I I I I II I 
orflll TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
310 320 330 340 350 



The complete length ORF1 1 lng nucleotide sequence <SEQ ID 449> is: 
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1 ATGCCGTCTG AAACACGCCT GCCGAACCTT ATCCGCGCCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGaacaaacC GCGCAaaccg 

101 TTACCCTGCA AGGCGAAAcg aTGGGTACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCCCCT GCCAAAATAC AAAAGCGCAT 

201 TGATGATGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TACCAGACCG 

251 ATTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ATTTCGCACA CGTTACCGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACTCAC GGCGCACTGG ACGTAACCGT CGGCCCTTTG GTCAACCTTT 

401 GGGGGTTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGCAACA 

501 * AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAA GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAAtcg gcggcGAGTT 

651 GCACGGCAAA GGCAAAAATG CGCACGGCGA ACCGTGGCGC ATCGGTATAG 

701 AGCAACCCAA TATCATCCAA GgcgGCAata CGCAGATTAt cgtcccgctg 

751 aaCaaccgtt cgctTGCCAC TTCCGGCGAT TAccgtaTTT tccacgtcgA 

801 TAAAAAcggc aaacgccttt cccacaTCAT CAATCCCaAC aacAAACgac 

851 ccATCAGcca caacctcgcc tccatcagcg tggtctcAGA CAGTGCAATG 

901 ACGGCGGACG GTTtatCCAC AGGATTATTT GTTTTAGGCG AAACCGAAGC 

951 CTTAAGGCTG GCAGAACAAG AAAAACTCGC TGTTTTCCTA ATTGTCCGGG 

1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 450>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor from Kinfluenzae: 

sp|P44550|YOJLJiAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR >gi 1 1074292 Ipir I 4 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573128 (U32702) hypothetical [Haemophilus influenzae] Length = 346 
Score = 353 bits (896), Expect = 9e-97 

Identities = 181/344 (52%), Positives - 247/344 (71%), Gaps = 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 

+ LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sb jet : 1 MKKLISGIIAVAMALSLAACQKET-KVISLSGKTMGTTYHVKYLDDGSITATSE-KTHEE 58 

Query: 67 IDDALKEVNRQMSTYQTDSEISRFNQHT-AGKPLRISSDFAHVTAEAVRLNRLTHGALDV 125 

1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T GALDV 
Sbjct : 59 IEAILKDVNAKMSTYKKDSELSRFNQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVG P+VNLWGFG P+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 SSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQGGNTQ 245 
S S I AKGFGVD+ VA +LE4- QNY+VEIGGE+ KGKN G+PW+I IE+P + 

Sbjct: 17 9 SSIAKGFGVDQVAEKLEQLNAQNYMVEIGGEIRAKGKNIEGKPWQIAIEKPTTTGERAVE 238 
Query: 24 6 IIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISVVSDSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR +H I+P PI H+LASI+V++ ++MTADGL 
Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 349 

STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 
Sbjct: 298 STGLFVLGE DKALEVAE KNN LA VYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 54 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 45 1>: 

1 . . CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

5 101 GCGGCGCGGC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGCCkA yTGGCAATCG GCGTGATGGG 

201 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

251 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 

301 TTGCGCGATA AACAAACGGG TgCGTATTTG GACGGCTGGT TGCAATACCA 

10 351 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 

401 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 

451 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 

501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 

551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 

15 601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 

651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 

701 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 

751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA . . 

This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 

20 1 . . PCRRQGDDVY AAHASRQKLW LRFIGGRSHQ NIRGGAAADG WRKGVQIGGE 

51 VFVRQNEGSX LAIGVMGGRA GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 

101 LRDKQTGAYL DGWLQYQRFK HRINDENRAE RYKTKGWTAS VEGGYNALVA 

151 EGIVGKGNNV RFYLQPQAQF TYLGVNGGFT DSEGTAVGLL GSGQWQSRAG 

201 IRAKTRFALR NGVNLQPFAA FNVLHRSKSF GVEMDGEKQT LAGRTALEGR 

25 251 FGIEAGWKGH MSA. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with putative secreted VirG-homolgue ofN. meningitidis (accession number 
A32247) 

ORF and virg-h protein show 51% aa identity in 261aa overlap: 

30 Orf35 5 QGDDVYAAHASRQKLWLRFIGGRSHQNIRGGAA-ADGWRKGVQIGGEVFVRQNEGSXLAI 63 

+ D++ R+ LWLR I G S+Q ++G A +G+RKGVQ+GGEVF QNE + L+I 

virg-h 396 KNSDIFDRTLPRKGLWLRVIDGHSNQWVQGKTAPVEGYRKGVQLGGEVFTWQNESNQLSI 455 

Orf35 64 GVMGGRAGQHASVNGKG— GAAGSDLYGYGGGVYAAWHQLRDKQTGAYLDGWLQYQRFKH 121 
35 G+MGG+A Q ++ + ++ G+G GVYA WHQL+DKQTGAY D W+QYQRF+H 

virg-h 456 GLMGGQAEQRSTFHNPDTDNLTTGNVKGFGAGVYATWHQLQDKQTGAYADSWMQYQRFRH 515 

Orf35 122 RINDENRAERYKTKGWTASVEGGYNALVAEGIVGKGNNVRFYLQPQAQFTYLGVNGGFTD 181 
RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 
40 virg-h 516 RINTEDGTERFTSKGITASIEAGYNALLAEHFTKKGNSLRVYLQPQAQLTYLGVNGKFSD 575 



45 



Orf35 182 SEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVNLQPFAAFNVLHRSKSFGVEMDGEKQTL 241 

SE V LLGS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ + 
virg-h 57 6 SENAHVNLLGSRQLQTRVGVQAKAQFSLYKNIAIEPFAAVNALYHNKPFGVEMDGERRVI 635 

Orf35 242 AGRT ALE GR F G I E AGWKGHM S 2 62 

+TA+E + G+ K H++ 
virg-h 636 NNKTAIESQLGVAVKIKSHLT 656 



50 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF35 shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) from strain A of N. 
meningitidis: 

10 20 30 

orf 35 . pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 
55 : | ( I i | ! | | j | | | 1 | j J f | | | | | | M | | 
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QRLAIPEAEAVLYAQQAYAANTLFGLRAADRGDDVYAADPSRQKLWLRFIGGRSHQNIRG 
310 320 330 340 350 360 

40 50 60 70 80 90 

GAAADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKGGAAGSDLYGYGGGV 

| | I I I I I I I I I I I I ! I I I I I M t I I I I I I I I I I I I I I I I I I 1 I I I I I I I I : I I I i I i 
GAAADGRRKGVQIGGEVFVRQNEGSRLAIGVMGGRAGQHASVNGKGGAAGSYLHGYGGGV 

370 380 390 400 410 420 

100 110 120 130 140 150 

YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 
{ | | ( | I I I M 1 M I I t I I I I I I I I I I I I I I I I M I I I I I I 1 I I t I I I I I II ! I I M 1 I : 1 
YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGVV 
430 440 450 460 470 480 

160 170 180 190 200 210 

GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
M I I I! I I I I I II I I I I I I I I I t I I I I I I I I I M I I I I I I I II I I I I I I I I I I ) M I M I 
GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
490 500 510 520 530 540 

220 230 240 250 260 

LQ P FAAFN VLHRSKS FG VEMDGEKQT LAGRTALEGR FG I E AGWKGHMS A 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I M I I I I ! I 
LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSARIGYGKRTDGD 
550 560 570 580 590 600 

KEAALSLKWLFX 
610 620 

The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 

1 ATGTTCAGAG CTCAGCTTGG TTCAAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCAAAATT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

201 TAATATGCCC GTTGTTAAGA AATATATTAC AG AT AC TT AC GGGGATAATT 

251 TAAAGGATGC GGTTAAGAAG CAATTACAGG ATTTATACAA AACAAGACCC 

301 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 

351 GCTTGGACCA AAATTTAGTA TACTCAAACA GAAAAACCCC GATTTAATTA 

401 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 

451 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 

501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 

551 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 

601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 

651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

701 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 

751 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

801 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACTCTAAT TCGTTTGCGT 

851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 

901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 

951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

1001 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

1051 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

1201 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 

1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

1301 AGTTGCGCGA TAAACAAACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 

1351 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 

14 01 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

1451 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 

1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 

1551 GGGGACGGCG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 

1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 

1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 

1701 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 

1751 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 

1801 TACGGCAAAA GGACGGACGG CG ACAAAGAA GCCGCATTGT CGCTCAAATG 

1851 GCTGTTTTGA 



orf 35a 

orf 35 .pep 
orf 35a 

orf 35 .pep 
orf35a 

orf 35 .pep 
orf 35a 

orf 35 .pep 
orf 35a 

orf 35a 
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This encodes a protein having amino acid sequence <SEQ ID 454>: 



1 


MFRAQLGSNT 


51 


EINIQGKNYN 


101 


EAWEENKKRT 


151 


TSLNNIFNKK 


201 


TSDNARIRLN 


251 


QSGWLERRP 


301 


EGGFCLGVQR 


351 


QKLWLRFIGG 


401 


GGRAGQHASV 


451 


QRFKHRINDE 


501 


QAQFTYLGVN 


551 


PFAAFNVLHR 


601 


YGKRTDGDKE 



10 



15 Homology with a predicted QRF from N. gonorrhoeae 

ORF35 shows 51.7% identity over a 261aa overlap with a predicted ORF (ORF35ngh) from TV, 
gonorrhoeae: 

orf35 pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 34 

:::(:: I : I I I I I I : I : I : : I 

20 orf35ngh FTKVQERDDIAI YAQQAQAANTLFALRLNDKNSDI FDRTLPRKGLWLRVI DGHSNQWVQG 370 

orf 35 . pep GAA-ADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKG-- GAAGSDLYGYG 91 

: I : : I : I I I I I : ! I I I I : III:: I : I I : I I I : I I : : : : : : : : : I : I 
orf 35nqh KT APVEGYRKGVQLGGE VFTWQNE SNQLS IGLMGGQAEQRST FRN P DT DNLTTGNVKGFG 4 30 

25 

orf 35 , pep GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAE 151 

: I Ml : I I I I : i i I II 1! : t : I : I I I I I : I I I t I : I I : Hi I I I : I : I I I I t : I I 
orf 35ngh AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 4 90 

30 orf 35 . pep GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 211 

: : |||::| I I I I I I I : I I I I I I I I : I I I : : I : I I I I I I I I : I : : I I : : H : I 
orf35ngh HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 550 

orf 35. pep GVNLQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 263 

35 I I : : I I I": I I : : : : I II I I : I I : : : : : : : I :: I : : I : I I : I : : 

orf35ngh GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 610 

A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 456>: 

1 . . KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPKHPFDPF ENINNSKKIS 

40 51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDIIF EGKALDNLKH LDGHQIVKVN DTADKDAFRL SSKYRKGIYT 

45 301 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 

351 PRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLSI 

401 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 

451 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 

501 VYLQPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 

50 551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from N. meningitidis and N, gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 55 



55 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 457>: 
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1 . GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

201 TGCGCAACGA GAATGGGAAA ATAAAACAGG GTTAGATTTT AATCATTTTA 

251 TAGGTGGTGA TAT CAATAAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

This corresponds to the amino acid sequence <SEQ ID 458; ORF46>: 

1 ..AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 

51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 

101 TRGDVRV I QQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 459>: 



1 . . GCAGTGTGCC TnCCGATGCA TGCACACGCC TCAnATTTGG CAAACGATTC 

51 TTTTATCCGG CAGGTTCTCG ACCGTCAGCA TTTCGAACCC GACGGGAAAT 

101 ACCACCTATT CGGCAGCAGG GGGGAACTTG CCGAGCGCCA GTCTCATATC 

151 GGATTGGGAA AAATACAAAG CCATCAGTTG GGCAACCTGA TGATTCAACA 

201 GGCGGCCATT AAAGGAAATA TCGGCTACAT TGTCCGCTTT TCCGATCACG 

251 GGCACGAAGT CCATTCCCCs TTCGACAACC ATGCCTCACA TTCCGATTCT 

301 GATGAAGCCG GTAGTCCCGT TGACGGATTT AGCCTTTACC GCATCCATTG 

351 GGACGGATAC GAACACCATC CCGCCGACGG CTATGACGGG CCACAGGGCG 

4 01 GCGGCTATCC CGCTCCCAAA GGCGCGAGGG ATATATACAG TTACGACATA 

4 51 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGATATCGT 

651 T AAAAAC AT C ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ED 460; ORF46-l>: 



1 . . AVCLPMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHHPADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. gonorrhoeae 

ORF46 shows 98.2% identity over a lllaa overlap with a predicted ORF (ORF46ng) from N. 
gonorrhoeae: 



orf4 6.pep AEYVQFSIDLFSVGKSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 45 

I t I I I I I I I I III I I I I t I I I I I I I II I I I 

orf4 6ng PKTGVPFDGKGFPNFEKHVKYDTKLDIQELSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 217 

orf 4 6 . pep EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

I I I I I I I I I I I I I i I I I i I I I I I I I I I I I I I I I I t M I I i I 1 I I M I : I I I I I I I M I I 1 

orf4 6ng EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGAVTGGHSLTRGDV 277 



or f 4 6 . pep RVIQQTSAPDKHGXLSSDSGN 12 6 

1 I I I ) I I I M I I I 11 I 1 I I I 
orf4 6ng RVIQQTSAPDKHGVLSSDSGN 298 

A partial ORF46ng nucleotide sequence <SEQ ID 461 > is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 



1 . . RRLKHCCHAR LGSAFHRKQD 

51 RTRHRSRQQY LYGSHPHQRD 

101 EIRRQRQXCR CRLGKIPSLS 

151 KLADQRHPKT GVPFDGKGFP 



GAHQRFGRYG ATQRLCRSSH PRLGSPKPQC 
WSCPGKIQLG RHHGTSCRAV ADXRDRICER 
IPKYPLKLEQ RYGKENITSS TVPPSNGKNV 
NFEKHVKYDT KLDIQELSGG GIPKAKPVFD 
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201 AKPRWEVDRK LNKLTTREQV EKNVQETRRR SQSSQFKAHA QREWENKTGL 

251 DFNHFIGGDI NKKGAVTGGH SLTRGDVRVI QQTSAPDKHG VLSSDSGN* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 

1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

5 51 CCTGCCGATG CATGCACACG CCTCAGATTT GGcaAACGAT CCCTTTATCC 

101 GgCaggttcT CGaccGTCAG CATTTCGaac ccgacggGAa ATACCaCCTA 

151 TTcggCaGCA GGGGGGAGCT TgccnagcGC aacggccATa tcggattggG 

201 aaacaTAcaa Agccatcagt tGggccacct gatgattcaa caggcggccg 

251 ttgaaggaaA TAtcgGctac attgtccgct tttccgatca cgggcacaaa 

10 301 ttccattcgc ccttcGAcaa ccaTGCCTCA CATTCCGATT CTGACGAAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

401 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

4 51 CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

15 551 GGCTTGCCGA CCGTTTCCAC AATGCCGGCG CTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCc gccGAAGCCT TCAACGGCAC TGCAGATATC GTCAAAAACA 

7 01 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCagGGT 

751 ATAAGCGAAG GCTCAAACAT TGCTGTCATG CACGGCTTGG GTCTGCTTTC 

20 801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 

851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA TGGCAGCCAT 

951 CCCCATCAAA GGGATTGGAG CTGTCCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGCGAT CGCATTGCCG 

25 1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGC 

12 01 AAAAATGTCA AACTGGCAGA CCAACGCCAC CCGAAGACAG GCGTACCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAGAA GCACGTGAAA TATGATACGA 

30 1301 AGCTCGATAT TCAAGAATTA TCGGGGGGCG GTATACCTAA GGCTAAGCCT 

1351 GTGTTTGATG CGAAACCGAG ATGGGAGGTT GATAGGAAGC TTAATAAATT 

1401 GACAACTCGT GAGCAGGTGG AGAAAAATGT TCAGGAAACG AGAAGAAGGA 

1451 GTCAGAGTAG TCAGTTTAAA GCCCATGCGC AACGAGAATG GGAAAATAAA 

1501 ACAGGGTTAG ATTTTAATCA TTTTATAGGT GGTGATATCA ATAAGAAAGG 

35 1551 CACAGTAACA GGAGGGCATA GTCTAACCCG TGGTGATGTA CGGGTGATAC 

1601 AACAAACCTC GGCACCTGAT AAACATGGGG TTTATCAAGC GACAGTGGAA 

1651 ATTAAAAAGC CTGATGGAAG TTGGGAGGTG AAAACGAAAA AAGGTGGGAA 

1701 AGTGATGACC AAGCACACCA TGTTCCCAAA AGATTGGGAT GAGGCTAGAA 

1751 TTAGGGCTGA AGTTACTTCG GCTTGGGAAA GTAGAATAAT GCTTAAGGAT 

40 1801 AATAAATGGC AGGGTACAAG TAAATCGGGT ATTAAAATAG AAGGATTTAC 

1851 CGAACCTAAT AGAACAGCAT ATCCCATTTA TGAATAG 

This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l>: 

1 LGISRKISLI LSILAVCLPM HAHA SDLAND PFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAXR NGHIGLGNIQ SHQLGHLMIQ QAAVEGNIGY IVRFSDHGHK 

45 101 FHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLADRFH NAGAMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFMAAIPIK GIGAVRGKYG LGGITAHPVK RSQMGAIALP 

50 351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

4 01 KNVKLADQRH PKTGVPFDGK GFPNFEKHVK YDTKLDIQEL SGGGIPKAKP 

451 VFDAKPRWEV DRKLNKLTTR EQVEKNVQET RRRSQSSQFK AHAQREWENK 

501 TGLDFNHFIG GDINKKGTVT GGHSLTRGDV RVIQQTSAPD KHGVYQATVE 

551 IKKPDGSWEV KTKKGGKVMT KHTMFPKDWD EARIRAEVTS AWESRIMLKD 

55 601 NKWQGTSKSG IKIEGFTEPN RTAYPIYE* 

ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 

10 20 30 40 

orf 4 6-1. pep AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

I i I II I i M I I I I II I I I I I I II I I I I I I I I I II I I I I I I I I I 

60 orf 4 6ng-l LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 

50 60 70 80 90 100 
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orf 4 6-1 oep QSHIGLGKIQSHQLGWLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
-p P 7 :!mj:m|m:MI( t|,::|t||tMIIIIIII: I I I I I I 1 I I I I I M I I I I I 
orf46na-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 
y 70 80 90 100 110 120 

^ 110 120 130 140 150 160 

orf4 6-l pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
I | i | | | I I I II I i i I III M I I I t I I I I I I I t I I I t I I I t t I ! I I t I I! I M I I I I I I I I 
orf4 6ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
10 130 140 150 160 170 180 

170 180 190 200 210 220 

orf 4 6-1 pep TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
| N { I I M I I I I { : I M i I I I I I I I I I I I I M I I I I I I I I I I I I t I I I I I I I I I I I I I I I 
15 orf4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

190 200 210 220 230 240 



20 



orf 4 6-1. pep I 

I 

orf4 6ng-l IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 

250 260 270 280 290 300 



Homology with a predicted ORF from N. meningitidis (strain A) 
25 ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from strain A of 
N. meningitidis: 

10 20 30 40 50 60 

orf 4 6a. pep LGISRKISLILSILAVCLPMHAHASDLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 
I I I I I I t I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
30 orf 4 6ng-l LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 4 6a . pep SGHIGLGNIQSHQLGNLFIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
35 : I I I I I I I I I I I I I I : I : I I I 1 I :: I I I I I I M I I I I I I : I I M I I I I I I I II I I I I I I 

orf4 6ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

40 or f 4 6a . pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNtTDNRS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I M I I I I 
orf 4 6ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

130 140 150 160 170 180 

45 190 200 210 220 230 240 

orf 4 6a . pep TGQRLVDRFHNTGSMLTQGVGDGFKRATRYS PELDRSGNAAEAFNGTADI VKN 1 1 GAAGE 

I I I I I : I I I I I : I : I I I I I II I I I M I II I 1 I I I I I I 1 II I I I I I II I I 1 I I II I I M II 
orf 4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVECNIIGAAGE 

190 200 210 220 230 240 

50 

250 260 270 280 290 300 

orf 4 6a . pep IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
I M I II 1 I I I I I I I I I I I I M I I I I II I I I M ! II I 1 I I I I I I I I I I II II I I M I I I 1 I 
orf4 6ng-l IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
55 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 6a . pep NAAQG I EAVSN I FT AV I P VKG I GAVRGK YGLGG I T AH PVKRS QMGE I AL PKGK S AV S DN F 

1 I I I I I I I I I 1 I I I : I I : M I I I I II I 1 I I I I I I I I I I I I I II I I I I M I I I I M I I I 
60 orf4 6ng-l NAAQGIEAVSNIFMAAIPIKGIGAVRGKYGLGGITAHPVKRSQMGAIALPKGKSAVSDNF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 4 6a . pep ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLANKRHPKTKVPFDGK 
65 | | | || || | | J i | | | || I I I I II II I I I II I II I! I I I I 1 I II M I I : : I I I I I I I I I I I 

orf4 6ng-l ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLADQRHPKTGVPFDGK 
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370 380 390 400 410 420 

430 440 450 460 470 

orf4 6a.pep GFPNFEKDVKYDTRINTAVPQVN PIDEPVFN — PKGSVGSAHSWSITARIQYAKLP 

t t I ! I I I Mil!::: : ::: I :(!!: |: I : ::|:| I I 
orf4 6ng-l GFPNFEKHVKYDTKLD— IQELSGGGIPKAKPVFDAKPRWEVDRKLN-KLTTREQVEKNV 

430 440 450 460 470 



480 490 500 510 520 530 

orf 4 6a . pep RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 
:: I I 

orf4 6ng-l QETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDVRVIQQTS 
480 490 500 510 520 530 



The complete length ORF46a DNA sequence <SEQ ID 465> is: 



1 TTGGGCATTT CCCGC AAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

51 CCTGCCGATG CATGCACACG CCTCAGATTT GGCAAACGAT TCTTTTATCC 

101 GGCAGGTTCT CGACCGTCAG CATTTCGAAC CCGACGGGAA ATACCACCTA 

151 TTCGGCAGCA GGGGGGAACT TGCCGAGCGC AGCGGTCATA TCGGATTGGG 

201 AAACATACAA AGCCATCAGT TGGGCAACCT GTTCATCCAG CAGGCGGCCA 

251 TTAAAGGAAA TATCGGCTAC ATTGTCCGCT TTTCCGATCA CGGGCACGAA 

301 GTCCATTCCC CCTTCGACAA CCATGCCTCA CATTCCGATT CTGATGAAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

4 01 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

4 51 CCCGCTCCCA AAGGCGCGAG G GAT AT AT AC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

551 GGCTTGTCGA CCGTTTCCAC AATACCGGTA GTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCC GCCGAAGCTT TCAACGGCAC TGCAGATATC GTCAAAAACA 

7 01 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCAGGGT 

7 51 ATAAGCGAAG GCTCAAACAT TGCTGTTATG CACGGCTTGG GTCTGCTTTC 

801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 

851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA CGGCAGTCAT 

951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAAT ACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA AAGTGCCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

1401 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

1451 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAATAATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

1551 AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GGAAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA CACACAAATG A 

This corresponds to the amino acid sequence <SEQ ID 466>: 



1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 

101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

4 01 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 

4 51 NPKGSVGSAH SWSITARIQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 

501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 

551 GKITHK* 
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Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 
of adhesins, it is predicted that the proteins from N. meningitidis and Kgonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 56 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 467>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

2 51 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG... 

This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGL . . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

401 CCGCCGCCAA AACCGACTTC CGGCACATTG CCGTCTGCGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 

51 LDYLPAALLI ALPWRFVKIA G VLAFWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFI LTAP APYQ IMTGLL LLYMLAMPFV L QKAAAKTDF R HIAVCAAVV 

151 AAAGYFTG HL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 
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351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 
4 01 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 
451 NLNETFRYLK QGHVAWLNFK IK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF48 shows 94.1% identity over a 1 19aa overlap with an ORF (ORF48a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 48 . pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATA RPIVNLDYLPAALLI 
I I I I I I I II i M I I i I I I I I I II i I I i ( I M I ! I I I I I II M M II M I i MINIM 
orf 4 8a MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATA RPIVTS1LXYLPAALLI 

10 20 30 40 50 60 



70 80 90 100 110 119 

orf 4 8 pep ALPWRFVKIAG VLAFWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPYQIMTGL 
INN II I Mil 1 I I f U M i f I M 11 11 M I I I I M I M I M MM M M M I 
or f 4 8 a ALPWRXVKIXG VLAXWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI XTAPALYQIMTGLL 
70 80 90 100 110 120 



orf 4 8a LLYMLAMPFVLQKAAAKTDFRHIAACAAWVAAGYFTGHLSXYDRGRMANIFGANNFYYA 
130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence <SEQ ID 471 > is: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

4 01 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

451 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

7 51 CTGGCGCAAA AAGANCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGATCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ANTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCNGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC NTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGNCTGGCT 

14 01 GAACTTCAAA ATCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 472>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLX PNAVFWVLAL LTATA RPIVN 

51 LXYLPAALLI ALPWRXVKIX G VLAXWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFIXTAP ALYQ IMTGLL LLYMLAMPFV L QKAAAKTDF R HIAACAAVV 

151 VAAGYFTGHL SXYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKXRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 
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4 01 TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 
4 51 NLNETFRYLK QGHVXWLNFK IK* 

ORF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 

10 20 30 40 50 60 

5 orf 4 8a pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 

I | M I II M I I M I I I I I I M I M I ! t I I I I 1 I I I M i M I M 1 I I M I I I I I II 1 M 
orf 4 8-1 MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

JO 70 80 90 100 110 120 

orf 4 8a . pep ALPWRXVKIXGVLAXWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 
Mill III MM I M II M I I M I M I M I I I M II I M I I I I I II I I M M M I 
orf 48-1 ALPWRFVKIAGVLAFWLAVLFDGLMMVIOLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

15 

130 140 150 160 170 180 

orf 4 8a . pep LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMANIFGANNFYYA 

I { I I I I i ! ! I I I I I I I ! I ! I ! I I I : ! I M I : I I I I I i I I II I I M II M M I M I M M 
O r f 4 8 - 1 LL YMLAMP FV LQKAAAKT D FRH I AVC AA WAAAG Y FTGHL S Y YDRGRMAN I FG ANN FY YA 

20 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 48a , pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

II I M I I M I M M M I M M II II II M I M I I M II M M M II M I I M M I II I II 
25 orf 4 8-1 KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 4 8a . pep ELQNATFAKLLAQKXRFSVWESGSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 
30 I II M I M II I II I II I M II i M I II I M : I M I M M II II I M M I M II I I I M I 

orf 4 8-1 E L QN AT F AKL L AQKDR F S VWE S G S F P F I G AT VE GEMRE L C AYGG LRG FALRRAP DE K FAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

35 orf 4 8a. pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

' I M M M M M I M I I II II II II M I M II I M II M I II II M II I I M I M M M M 
orf 48-1 C LPNRLKQEG YAT FAMHGAGS S L Y DRFS W Y PRAG FQE I KT AEN L I GKKTC AI FGGVC D S E 

310 320 330 340 350 360 

40 370 380 390 400 410 420 

orf 4 8a. pep LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 
I II I I II I II I M I M I M II M I M M I I I I I I I I II I II II I I II M II I M M M 
orf 4 8-1 LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

45 

430 440 450 460 470 

orf 48a - pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVXWLNFKIKX 
I I I M I I II II I I II I I M M M I I M M I II I I I M M I I I I I I I M M I I 
orf 48-1 FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLN FKIKX 

50 430 440 450 460 470 

Homology with a predicted ORF from N. gonorrhoeae 

ORF48 shows 97.5% identity over a 119aa overlap with a predicted ORF (ORF48ng) from N. 
gonorrhoeae: 

55 orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPXVNLDYLPAALLI 60 

I M I : I I I : II II I I M I M I II M II M I M M I I I I I M I II I M I II I I M I I II I I 
orf 4 8ng MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

orf 4 8 . pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 
60 II M I M I I M I II M I II I M I I M II II M M I II I M I II I II I II II I I I I M I 

orf 4 8ng ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 
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The ORF48ng nucleotide sequence <SEQ ID 473> was predicted to encode a protein having amino 
acid sequence <SEQ ID 474>: 

1 MNIHALL5E O WTLPPFLPKR LLLSLL ILLA PNAVFWVLAL LTATARPIVN 
51 T. DYT.PAAT.LT ALPWRFVKIA G VLAFWPAVL FDGLMMVI Q L FPFMDL1GAI 
101 NLVPFILTAP APYO IMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAVV 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
201 PYASMGNGG . . 

Further work identified the complete gonococcal DNA sequence <SEQ ID 475>: 



1 ATGAATATTC ACGCCCTGCT 

51 GCCGAAACGG CTGCTGCTGT 

101 TGTTTTGGGT TTTGGCACTG 

151 TTGGACTACC TTCCCGCCGC 

201 CAAAATTGCC GGCGTATTGG 

251 TGATGATGGT GATCCAACTC 

301 AACCTCGTCC CCTTCATCCT 

351 CGGGCTGTTG CTGCTGTATA 

401 CCGCCGTCAA AACCGACTTC 

451 GCGGCAGCCG GCTATTTCAC 

501 GATGGCCAAT ATCTTCGGCG 

551 CGATGCTCTA CACCGTCAGC 

601 GTCGACCCCG TCTTCCTCCC 

651 GCTGAGTGAG CCGAAATCTC 

*7 01 GGGGGCTGCC GGGCAATCCC 

7 51 CTGGCGCAAA AAGACCGTTT 

801 CATCGGCGCG ACGGTCGAAG 

851 GTTTGCGCGG GTTCGCACTG 

901 TGCCTCCCCA ACCGTTTGAA 

951 CGGCGCGGGT AGTTCGCTTT 

1001 GCTTTCAAAA AATCAAAACC 

1051 GCCATTTTCG GCGGCGTGTG 

1101 ATTTTTCAAA AAACACGACA 

1151 GCCACGCCGA CTATCCCGAA 

1201 ACCGAATACG GCCTGCCCGC 

1251 GCACACCCAA TtcttcgACC 

1301 TGAAAGGCAC GGAAGTCATC 

1351 AACCTCAATG AAACCTTCCG 

1401 GCACTTCAAA ATCAAATAA 

This encodes a protein having amino acic 



CTCCGAACAA TGGACGCTGC CGCCATTCCT 
CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 
CTGACCGCCA CCGCCCGCCC GATTGTCAAT 
GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 
CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 
TTCCCTTTTA TGGACCTCAT CGGCGCCATC 
GACCGCCCCC GCCCCTTATC AGATAATGAC 
TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 
CGACACATTG CCGTCTGTGC CGCCGTTGTG 
CGGCCATTTG AGTTACTACG ACCGGGGGCG 
CAAACAACTT CTATTACGCc aAAAGTCAGG 
CAGAATGCCG ACTTTATTAC CGCCGgcctG 
CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 
AAAAAATCCT CTTTATCGTC GCCGAATCTT 
GAGCTTCAAA ACGCCACTTT TGCCAAACTG 
TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 
GCGAAATGCG CGAATTGTGC GCCTACGGCG 
CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 
ACAAGAAGGT TACGCCACCT TTGCGATGCA 
ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 
GCCGAAAACC TGATCGGTAA AAAAACCTGC 
CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 
AGGGACTGTT TTACTGGATG ACGCTGACCA 
TCCGACATTT TCAACCACAG GCTCAAATGC 
CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 
AACTGGCGGA TTTGATCCGA CGCCCCGAAA 
ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 
CTACCTCAAA CAGGGACACG TCGCCTGGCT 



sequence <SEQ ID 476; ORF48ng-l>: 



1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV LQKAAVKTDF RHIAVCAAVV 

151 AAAGYFTGHL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATRLSE PKSQKILFIV AESWGLPGNP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQKIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIR RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVAWLHFK IK* 

ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 



10 20 30 40 50 60 

orf 48-1 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
I I t I : M I :( I I I I I I I I I I I I I I I I I I I I I I t t It I II I I I I I I I II I I I I M I I i I I t 
orf48ng-l MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 4 8-1 .pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
1 I I I I I I I I I I I I I I I f I I I I I I I I I ! I I II I I I I I I I 1 II I I I I I I I I I I I I I I I I M 
orf48ng-l ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf4 8-l pep LLYMI^PE^LQKAAAKTDFRHIAVCAAVVAAAGYFTGHLSYYDRGRMANIFGANNFYYA 
I I I I t I I I I I I I t I I : I I I t I I I I I I ! I 11 I I I I I 1 I ! I < I ! I I I 1 M 1 I I I I M I I t I I 
orf48ng-l LLYMLAMPFVLQKAAVKTDFRHIAVCAAWAAAGYFTGHLSYYDRGRMANIFGANNFYYA 
5 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 48-1. pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
f M I II I I I I I i I II t I I II ! I I I I I ! I M I I M I I : I : I I II I I I I 1 II I II I M I : I I 
10 orf4 8ng-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 48-1 . pep ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
15 I I I M I I I I I I I II I I I I M I I I I M II I I I II I I II I I I I I I I I I M I I I I U I I I II I 

orf4 8ng-l ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

20 orf 4 8-1 .pep CL PNRLKQEG YAT FAMHGAG S S L Y DRFS W Y PRAGFQE I KTAENL I GKKTCAI FGGVC D S E 

I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I I 
orf4 8ng-l CL PNRLKQEG YAT FAMHGAG S S LY DRFS WY PRAGFQKI KTAENL I GKKT CAI FGGVC DS E 

310 320 330 340 350 360 

25 370 380 390 400 410 420 

orf 48-1. pep L FGE VSAFFKKH DKG LFYWMT LT S HAD Y PE S D I FNHRLKCTE YGL PAET DLCRN FS LHTQ 
I I I I I II I I I I I I I I I I I I I I I I I I I i I I I II I I I I I I II I II I I I I I I I I I I I I I I I 1 I 
orf 4 8ng-l LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDI FNHRLKCTE YGLPAETDLCRNFSLHTQ 

370 380 390 400 410 420 

30 

430 440 450 460 470 

orf 48-1. pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 
I I I I I I I I I : I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orf4 8ng-l FFDQLADLIRRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLHFKIKX 
35 430 440 450 460 470 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



40 Example 57 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 477>: 

1 ..GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

45 151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC.G TAGCGCCGAA 

351 CGGCAACGGC GA . ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

50 4 01 TGATCAATAT GTACGCC . . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 

1 . . VSGRYRALDR VSKIIIVTLS IATLAAAGIA MSRGMQMQSD FIEPTPWTLA 
51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 
101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

55 Further work revealed the complete nucleotide sequence <SEQ ID 479>: 



1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 
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51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

401 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

451 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTTAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALXI 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIAM SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGNAS F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM NA LALAGLIY 

401 LTGFTVLFL L NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) from strain A of N. 
meningitidis: 

10 20 30 

orf 53 .pep VSGRYRALDRVSK I I IVTLS I ATLAAAGI A 

II t I I I I I t I I t I I I I I I I I I I I I t I I I I 1 
orf 53a A AIVKMAIPSL MFD AGTVAALIMASCLIILV SGRYRALDRV5K IIIVTLSIATLAAAGIA 
110 120 130 140 150 160 



40 50 60 70 80 90 

orf 53. pep MSRGMQMQSDFIEPTPW TLAGLGFLIALMGWMFA PIEISAINSLWVTEKQRINPSEYRDG 
I I I I f I I I I I II 1 I 1 I I I M I I I I I I I II I I I I I II II I i I I II I I i I I I I I I i I I I I I I 
orf 53a MSRGMQMQSDFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
170 180 190 200 210 220 



100 110 120 130 139 

orf 53 . pep IFEFNVGY IASAVLALVFLALGXV APNGNGXTVQMAGGKYNGQLINMYA 
I I : I I I I I I II I II I I I I I I I I : II! : I I I I I I I I I I I I I I I I 
orf 53a IFDFNVGY IASAVLALVFLALGAFV QYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLV 
230 240 250 260 270 280 



orf 53a AFIAFACMYGTTITW DGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFD 
290 300 310 320 330 340 

The complete length ORF53a nucleotide sequence <SEQ ID 48 1> is: 
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1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 ACCGGGGATT ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCT CAT CATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

4 01 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

451 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TTGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTCAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This encodes a protein having amino acid sequence <SEQ ID 482>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLFKYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIAM SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFVQYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITWDGYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM NA LALAGLIY 

401 LTGFTVLFLL NLAGMFK* 

ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1: 



10 20 30 40 50 60 

or f 53a . pep MSEQH I STWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 
II f M I II I I i I t I I II I I I I I I t I ! I I I t I I t I i I t M I I I I I I I I I M I II I I M If I 
orf53-l MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 53a . pep FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 
I I I II M I 1 I ! ! I I I ! I! I I I I M I I I 1 I I I I I II 1 M I II 1 I I I I I I M I I 1 I I I I II I 
orf 5 3-1 FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 53a . pep MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 
I M I II I II I II I I I I II I I I I I M I I I I I II I I I M I I I I I II I I I I I I I I I I M I II I 
orf 53-1 MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 53a . pep I E PT P W T LAGLG FL I ALMGWM PAP IEISAINS L WVTEKQR IN P S E YRDG I FD FNVG Y I AS 
i I M I I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I I II I I I I I I M I I I M I I I M I 
orf 53-1 IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

190 200 210 220 230 240 



250 260 270 280 290 300 

or f 53a . pep AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 
I II II I I II I I I I I I I 1 I I I 1 I I I I I I I I M M I II I I I I I I I I 1 I I II I I I I M I II I I 
orf 53-1 AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

250 260 270 280 290 300 
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310 320 330 340 350 360 

TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 

Mi | |j Mill!. MM! ,M!IM M lii liliili! MIMitlil llllil lllliM 
T I T WDG Y ARAI AE PVRLLRGKDKTGN AE FFAWN I WVAG S G LAV I FW FDGVMANL LKFAM 
310 320 330 340 350 360 

370 380 390 400 410 

IAAEVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

| | t I I I t I I < I t M I I I t M I < I I I M I I I I I I I M t M I I 1 I I I I I 1 I I I I I I I I I I 
IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 
370 380 390 400 410 

Homology with a predicted ORF from N. gonorrhoeae 

ORF53 shows 92.1% identity over a 139aa overlap with a predicted ORF (ORF53ng) from N. 



VSGRYRALDRVSKI I IVTLSIATLAAAGIA 30 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AAIVKMAI PSLMFDAGTVAALIMASCLI ILVSGRYRALDRVSKI I IVTLSIATLAAAGIA 91 

MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 90 
I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 1 i II I ! 1 I I i M f I M t I I I I 1 I I 1 

MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 151 

I FEFNVGYI ASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 139 
! I : I I I I II I I I I I I I 1 I I M i : III i I I : I I I i I I I I I I M 

IFDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 211 



An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 
acid sequence <SEQ ID 484>: 

1 MPKKSCVYLW VFLILC IASA TINAGAVAIV TAAIVKMA IP SLMFDA GTVA 

51 ALIMASCLII LVSGRYRALD RVSK IIIVTL SIATLAAAGI A MSRGMQMQP 

101 DFIEPTPW TL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

201 VTIGGGSRPL VAFIAFACMY GAASTVV DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence <SEQ ID 485>: 

1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 

51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 

101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 

151 ATTATGGCAT CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT 

201 GGATCGTGTT TCCAAAATCA TCATTGTTAC TTTGAGCATC GCCACGCTTG 

251 CCGCCGCCGG CATCGCTATG TCGCGCGGTA TGCAGATGCA GCCCGATTTT 

301 ATCGAGCCGA CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT 

351 GATGGGCTGG ATGCCCGCGC CGATCGAAAT TTCCGCCATC AATTCTTTGT 

401 GGGTAACCGA AAAACAACGC ATCAATCCTT CTGAATACCG CGACGGGATT 

451 TTCGATTTCA ACGTCGGTTA TATCGCcagT GCGGTTTTGG CTTTGGTTTT 

501 CCTTGCACTG GGCGCGTTTG TGCAAT AC GG CAACGGCGAA GCAGTGCAGA 

551 TGGCGGGCGG CAAATATATC GGGCAATTGA TTAATATGTA TGCCGTAACC 

601 ATCGGCGGCT GGTCTCGTCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT 

651 GTACGGCACG ACGATTACCG TTGTGGACGG TTATGCGCGT GCCATTGCCG 

701 AACCCGTGCG CCTGCTGCGC GGCAGGGATA AAACCGGCAA CGCCGAGTTG 

751 TTtgccTGGA ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG 

801 GTTTGACggc gcaaTGGCgG AACtgcTCAA ATTTGCGATG ATtgccgcCT 

851 TTGTGTCCGC CCCTGTGTTC GCCTGGCTCA ACTACCGCCT CGTCAAAGGG 

901 GACAAACGCC ACAGGCTTAC CGCCGGTATG AACGCCCTTG CCATTGTCGG 

951 CCTGCTCTAC CTGGCCGGGT TTGCCGTTTT GTTCCTGTTG AACCTTACCG 

1001 GACTTTTGGC ATAG 



orf 53a. pep 
orf53-l 

orf 53a. pep 
orf53-l 



gonorrhoeae: 

orf 53 .pep 
orf 53ng 
orf 53 . pep 
orf 53ng 
orf 53. pep 
orf 53ng 



This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-l>: 
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1 

51 
101 
151 
201 
251 
301 



.KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL 
IMASCLIILV SGRYRALDRV SK IIIVTLSI ATLAAAGIAM 



IEPTPW TLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR 
FDFNVGY IAS AVLALVFLAL GAFV QYGNGE AVQMAGGKYI 
IGGWSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR 
FAWNIWVAGS GLAVIFWFDG AMAELLKFAM IAAFVSAPVF 



MFDAGTVAAL 
SRGMQMQPDF 
INPSEYRDGI 
GQLINMYAVT 
GRDKTGNAEL 
AWLNYRLVKG 



DKRHRLTAGM NALAIVGLLY LAGFAVLFLL NLTGLLA* 



ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 



10 



15 



20 



25 



30 



orf 53-1 .pep 
orf 53ng-l 



60 70 80 90 100 110 

ILTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTA 

: I I II I I I I I I ! II I I I I I I I I I I I I I I 
KKSCVYLWV FL I LCI AS AT INAGAVAI VTA 
10 20 30 



120 130 140 150 160 170 

orf 53-1. pep AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKI I IVTLS I ATLAAAGIAM 
I I I I M M M I I I I I I I M I 1 I I M ! 1 I I I I I 1 I I I I 1 I I I M I I 1 I I 1 I I I I I I I I I I I 
orf53ng-l AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKI I IVTLS I ATLAAAGIAM 
40 50 60 70 80 90 

180 190 200 210 220 230 

orf 53-1 .pep SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 
I I I I I I I I I ( I M I I t I I I I I I I i I 1 I I I M I I I I I I I I I I I I I I II I I I I I I I I II I I 
orf53ng-l SRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 

100 110 120 130 140 150 

240 250 260 270 280 290 

orf 53-1 . pep FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKY I GQLINMYAVT IGGWSRPLVA 
I I M I I I II I I I I I I I 1 II M I I I I I I I I M I M 1 I 11 1 II I I M I I I I I I M I I 1 I I 1 I 
orf53ng-l FDFNVGY I ASAVLALVFLALGAFVQYGNGEAVQMAGGKY I GQLINMYAVT IGGWSRPLVA 

160 170 180 190 200 210 



35 



300 310 320 330 340 350 

orf 53-1 . pep FIAFACMYGTTITVVDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDG 
I ( i I I I M I i I I I I I I I II I I I I I I I I I I I I : I I I I I I I : I I I I I I I I M I I I I I I I I I I 
orf53ng-l FIAFACMYGTTITVVDGYARAIAEPVRLLRGRDKTGNAELFAWNIWVAGSGLAVIFWFDG 

220 230 240 250 260 270 



40 



45 



360 370 380 390 400 410 

orf 53-1 . pep VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 
: I I : I I I I M I I I I I I I II I I I I I I I M I M : : I : I I : I I I I I I : : I I : I I : I I : I I 1 I I 
orf53ng-l AMAELLKFAMIAAFVSAPVFAWLNYRLVKGDKRHRLTAGMNALAIVGLLYLAGFAVLFLL 
280 290 300 310 320 330 



orf 53-1. pep NLAGMFKX 
I I : I :: 

orf53ng-l NLTGLLAX 



50 Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 58 

55 The following partial DNA sequence was identified in N, meningitidis <SEQ ID 487>: 

1 . . TTGCGGGAAA CGGCATATGT TTTGGATAGT TTTGATCGTT ATTTTGTTGT 
51 TGCGCTTGCC GGCTTGTTTT TTGTCCGCGC ACAATCCGAA CGCGAGTGGA 
101 TGCGCGAGGT TTCTGCGTGG CAGGAAAAGA AAGGGGAAAA ACAGGCGGAG 
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CTGCCTGAAA TCAAAGACGG TATGCCCGAT TTTCCCGAAC TTGCCCTGAT 
GCTTTTCCAC GCCGTCAAAA CGGCAGTGTA TTGGCTGTTT GTCGGTGTCG 
TCCGTTTCTG CCGAAACTAT CTGGCGCACG AATCCGAACC GGACAGGCCC 
GTTCCGCCT . . 

This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 

1 . . LRETAYVLDS FDRYFWALA GiFFVRAQSE REWMREVSAW QEKKGEKQAE 

51 ' LPEIKDGMPD FPELALML FH AVKTAVYWLF VGW RFCRNY LAHESEPDRP 

101 VPP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 489>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAAGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

7 01 AACGCACGTA TTCCCATATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

7 51 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCACCGTC 

851 ATGCAGGGCA GGGGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACGGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGAATTTC TCGCCTGATT CCGGAAAGTC AGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGACCGC AATCGATATT CAGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GTCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAG ACCGACCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGGATGACG GCAGTGAAGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCCATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

17 01 CAAGGTTGTC GATTCTTATT CCGGCCCCGT AATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATCT GGAAAAAGAT 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGA AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GAT TAT GAT C GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGTAATCTTG CGGGCTTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

24 01 GTGGTCGTGG TCGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATTTGAT TCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCTGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GG AT GAAG AG GTGCACCGCG TGGTCGAATA 

27 51 TTTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGACGAAACC 



151 
201 
251 
301 
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2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG TATCGGCTAC AACCGCGCCG 

2951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 490; ORF58-l>: 

1 MFVJIVLIVIL LLALAGXFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 PGMPDFPELA LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSHM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FHRHAGQGKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESQTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDETA DIHIEEPAAP DAWVVEPPEV 

401 PKVPMTAIDI QPPPPVSEIY NRTYEPPSGF EQVQRSRIAE TDHLADDVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSCRVSDTEA DEGAFPSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RVVETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPVVTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEK LPFI 

801 WWDEFADL MMTA GKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LLPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDDET 

951 DPMYDEAVSV VLKTRKASIS GVQ RALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 

Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 



also gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF58 shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf58 .pep LRETAYVLDSFDRYFW ALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

::: M I I I I I I I I II I M I I I I I I I I I I I I I I I M I I I I I I I I I I I 
orf58a MFWIVLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

10 20 30 40 50 



70 80 90 100 

orf 58 .pep FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPP 
I M I I I I I I II I I I I I I II I I I I I II I I I II I I I I I I I I I I I I 
orf 58a FPELALM LFHAVKTAVYWLFVGVV RFCRNYLAHESEPDRPVPPASANRADVPTASDGYSD 
60 70 80 90 100 110 

The complete length ORF58a nucleotide sequence <SEQ ID 491> is: 



1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAATCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AAC AGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

7 01 AACGCACGTA TTCCCGTATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 
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751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGGNAAAGGG CAGGCGGAGG CNAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCNGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAANTGTTTC 

1101 GTCTGTGGGA TACGGCGNTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGCCCGC AATNGATATT CCGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GGCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAA ACCGATCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGAATGACG GCAGTGAGGG 

1401 TGTGGCAGAG CGGTCAAGCG GGCAATATTT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCCGCC GGGCATNGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCGCCGCT GTTCAATCCC GGGGCGACGC AAACCGAAGA AGANCTGTTG 

1651 GANAACAGCA TCACCATCGA AGAAAAATNG GCGGAGTTCA AAGTCAAGGT 

17 01 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTAAATCT GGAAAAAGAN 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCT 

1851 CGGCAAAACC TGTATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAG CAT TT AC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGTNTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG GGAGAAAATC GGCAACCCGT 

2351 TCAGCCTCAC GCCCGACAAT CCCGAACCTT TGGANAAATT GCCGTTTATC 

2401 GTGGTCGTGG TTGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATCTTAT CCTTGCCACA CAACGCCCCA GTGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2 601 AATCGACAGC CGCACGATTC TTGACCAAAT GGGTGCGGAA AACCTGCTCG 

2651 GGCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACGGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GG AT GAAG AG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATN TTGAGCGGCG 

2801 GTATGTCCGA CGATTTGCTG GGAATCAGCC GGAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTGTCNGTT GTTTTGAAAA CGCGCAAAGC 

2901 CAGCATTTCT GGCGTGCAGC GCGCATTGCG TATCGGCTAT AATCGCGCCG 

2 951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTNGACAATG CTTGA 

This encodes a protein having amino acid sequence <SEQ ID 492>: 



1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQGKG QAEAKSPDVS 

301 QGQSVSDGTA VRDAXRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTEXVSSVG YGXPVYDETA DIHIEEPAAP wDAWWEPPEV 

401 PKVPMPAXDI PPPPPVSEIY NRTYEPPAGF EQVQRSRIAE TDHLADDVLN 

4 51 GGWQEETAAI ANDGSEGVAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSRRAXDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP GATQTEEXLL 

551 XNSITIEEKX AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKX 

601 LARSLGVASI RWETILGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RNLAGXNQKI AEAAARGEKI GNPFSLTPDN PEPLXKLPFI 

801 VVVVDEFADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

901 VHGAFASDEE VHRVVEYLKQ FGEPDYVDDX LSGGMSDDLL GISRSGDGET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP XDNA* 
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ORF58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 



10 



15 



20 



25 



30 



35 



10 20 30 40 50 60 

orf58a pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
| | ( | | I | || | 1 I I i I II I I I I i I I I II I I I I I M I i I ! I i I i i I I I I I I t I t t I I I I M ! 
orf 58-1 MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf58a pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
I I I I I I I I I I I I I I I I I I 11 I I I I 1 I I M I I I 1 I I I I i 1 t I t I I t I I I I I M I I I t I t I I 
orf58-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58a. pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
I I I I M I M I I I I I i I 1 I! M II I I I II 1 I II 1 I I I i I I I M i I I I I 1 i i i I I I 1 i i I i 1 
orf 58-1 EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58a . pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
1 I I I I I I I I I I I I I I I I I I I II I 1 I I M II I I I M I i I 1 II I 1 M M I I I I I I I M I I : I 
orf 58-1 EEATRALN S AALRET KKRY I DAFEKNETAV PKVRVS DT PMEGLQ 1 1 GL DD P VLQRT YSHM 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58a . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 
I ! I I I 1 I I I I I I I I I 1 I I I I 1 ! I I I I I II I II I I 1 I I I M I : 1 I I M I I I I I M I M I I I 
orf 58-1 FDADKE AFS E S AD YGFE P Y FEKQH PS AFS AVKAENARNAP FHRHAGQGKGQAE AKS P DVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58a . pep QGQSVSDGTAVRDAXRRVSVNLKEPNKATVSAEARISRLIPESRTVVGKRDVEMPSETEN 
I I I I I 1 I I I I I I I I M 1 I I I I I I I I I I I 1 I I 1 I I I I I I I I 1 I : M I M I 1 M I I I I I I I 
orf 58-1 QGQS VS DGTAVRDARRRVS VNLKE PNKATVSAEARI SRL I PE SQT VVGKRDVEMPSETEN 

310 320 330 340 350 360 



40 



370 380 390 400 410 420 

orf 58a . pep VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDIPPPPPVSEIY 
1111:11(1111 I I II I I I I I I II I I II I 1 I I I I I I I I I I I M I I II I I I II I I I I 
orf 58-1 VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWVVEPPEVPKVPMTAIDIQPPPPVSEIY 

370 380 390 400 410 420 



45 



50 



55 



60 



430 440 450 460 470 480 

orf 58a. pep NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 
I I I I I M : I I I I I I I II I I I I I I I I I I II i I I I I I i I I I I ( : I I I M : I I I I I I I I I I I I 
orf 58-1 NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58a. pep EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
t I I I I I I I II I I I I I I I I I I I I I : I I I I M I I I I I II I I i I I M II I I I I I I i I i I I 
orf 58-1 EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 58a. pep GATQTEEXLLXNSITIEEKXAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 

11(111 II I t I I II I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I) I I I 
orf 58-1 EATQTEEELLENSITIEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 



65 



610 620 630 640 650 660 

orf 58a. pep LARSLGVASIRVVETILGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
II I I I I I 1 II I I I I 1 I I M I I I I I I I I I I I I I I I I II I I I I I I I I I I I M I M I I I I II 
orf 58-1 LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 



670 



680 



690 



700 



710 



720 
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orf58a pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
I I I 1 I I 1 I 1 I I I [ I i I I I I I II ! t I I t I I ( I i I I I t I I 1 II I I I I I I II I I I t I I I I I I I 
orf 58-1 TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
670 680 690 700 710 720 

730 740 750 760 770 780 

orf58a pep EGIPHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGXNQKIAEAAARGEKI 
I 1 I I I I I t I I I I I I I I I I I I I I I I I I M I I I I t I I I M I I II I I I II I I I I I I I I I I 1 I 
orf58-l EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 
730 740 750 760 770 780 

790 800 810 820 830 840 

orf58a pep GNPFSLTPDNPEPLXKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
I I | I I I I 1 I : I I I I I I I I I I I I I I I I I I I I I I II I I I 1 M I I I I I I I ! I I M M I II I I 
orf 58-1 GNPFSLTPDDPEPLEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 

850 860 870 880 890 900 

orf 58a . pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

I ! I I I j I I I M I I M I I I M I 1 I II II I I I M I I I I I I I I I II I I I I I M ! I i I I I I I I 
orf 58-1 QRPSVDVI TGLIKANI PTRI AFQVSSKI DSRT I LDQMGAENLLGQGDMLFLLPGTAYPQR 

850 860 870 880 890 900 

910 920 930 940 950 960 

orf 58a. pep VHGAFASDEEVHRVVEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 

II I II I II II I I II I I I I I I II I I I I I I I I I i I I : : I 11:1111 I I I I I I I I I I I I 
orf 58-1 VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

910 920 930 940 950 960 



970 980 990 1000 1010 

orf 58a . pep VLKTRKAS I SGVQRALRIGYNRAARL I DQMEAEGI VSAPEHNGNRT I LVPXDNAX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I M I I I I I I I I I I 
orf 58-1 VLKTRKAS I SGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRT I LVPLDNAX 

970 980 990 1000 1010 



Homology with a predicted ORF from N. gonorrhoeae 

ORF58 shows complete identity over a 9aa overlap with a predicted ORF (ORF58ng) from K 
gonorrhoeae: 

orf 58 . pep ALMLFHAVKTAVYWLFVGVVRFCRNYLAHESEPDRPVPP 103 

I I I I I I I I I 

orf58ng SEPDRPVPPASANRADVPTASDGYSDSGNG 30 

The ORF58ng nucleotide sequence <SEQ ID 493> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 494>: 



1 . . SEPD^PVPPA SANRADVPTA SDGYSDSGNG TEEAETEAAE AAEEEAADTE 

51 DIATAVIDNR RIPFDRSIAE GLMQSESKTS PVRPVFKEIT LEEATRALSS 

101 AALRETKKRY IDAFEKNGTA VPKVRVSDTP MEGLQIIGLD DPVLQRTYSR 

151 MFDADKEAFS ESADYGFEPY FEKQHPSAFS AVKAENARNA PFRRHAGQEK 

201 GQAEAKSPDV SQGQSVSDGT AVRDARRRVS VNLKEPNKAT VSAEARISRL 

251 IPESRTWGK RDVEMPSETE NVFTETVSSV GYGGPVYDEA ADIHIEEPAA 

301 PDAWWEPPE VPEVAVPEID ILPPPPVSEI YNRTYEPPAG FEQAQRSRIA 

351 ETDHLAADVL NGGWQEETAA IADDGSEGAA ERSSGQYLSE TEAFGHDSQA 

401 VCPFEDVPSE RPSCRVSDTE ADEGAFQSEE TGAVSEHLPT TDLLLPPLFN 

451 PEATQTEEEL LENSITIEEK LAEFKVKVKV VDSYSGPVIT RYEIEPDVGV 

501 RGNSVLNLEK DLARSLGVAS IRVVETIPGK TCMGLELPNP KRQMIRLSEI 

551 FNSPEFAESK SKLTLALGQD ITGQPWTDL GKAPHLLVAG TTGSGKSVGV 

601 NAMILSMLFK AAPEDVRMIM IDPKMLELSI YEGITHLLAP WTDMKLAAN 

651 ALNWCVNEME KRYRLMSFMG VRNLAGFNQK IAEAAARGEK IGNPFSLTPD 

701 DPEPLEK LPF IVWVDEFAD LMMT AGKKIE ELIARLAQKA RAAGIHLILA 

751 TQRPSVDVIT GLIKANIPTR IAFQVSSKID SRTILDQMGA ENLLGQGDML 

801 FLPPGTAYPQ RVHGAFASDE EVHRVVEYLK QFGEPDYVDD ILSGGGSEEL 

851 PGIGRSGDGE TDPMYDEAVS WLKTRKASI SGVQRALRIG YNRAARLIDQ 

901 ^3EAEGIVSAP EHNGNRTILV PLDNA* 
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This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP-binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
homologous to the FTSK cell division protein of E. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 



ORF58ng: 


467 


IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRVVET 


526 




+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 




FtsK: 


8 68 




927 


ORF58ng: 


527 


IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 


586 




IPGK +GLELPN KRQ 4- L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 




FtsK: 


92 8 


tdpvdwptpt DMVifDATUYT DpuT nNAK'FRnKIP^PT.TVVT.nKnTAfTFPVVAnLAKWPHT. 


987 


ORF58ng: 


587 


LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPVVTDMK 


646 




LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 




FtsK: 


988 


LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 


1047 


ORF58ng: 


647 


LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP-- 


704 




AANAL WCVNEME+RY+LMS + G VRN LAG +K I AE A I +P+ D + 




FtsK: 


1048 


DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 


1107 


ORF58ng: 


705 


— LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 


762 






L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 




FtsK: 


1108 


PVLKKEPYIVVLVDEFADLiyiMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 


1167 


ORF58ng: 


7 63 


I KANI PTRI AFQVS SKI DSRT I LDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFAS DEEV 


822 






IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 




FtsK: 


1168 


IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 


1227 


ORF58ng: 


823 


HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 


882 






H VV+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 




FtsK: 


1228 


HAVVQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 


1286 


ORF58ng: 


883 


VQRALR I G YNRAARL I DQMEAEG I VS APEHNGNRT I LVP 921 








VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 




FtsK: 


1287 


VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 





Further work on ORF58ng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAGCAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGG AACAGCCGTC CCCAAAGTAC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

7 01 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

7 51 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGAGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAGCTGCC GATATCCATA 

1151 TTGAAGAGCC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGGAGGTAG CCGTACCCGA AATCGATATT CTGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAGCCGCC GGCAGGATTC GAGCAGGCGC 

1301 AACGCAGCCG CATTGCCGAA ACCGACCATC TTGCCGCTGA TGTTTTGAAT 
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1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCAGATGACG GCAGTGAGGG 

14 01 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAGATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 GGAAGAGACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATTT GGAAAAAGAC 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATTTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATT ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GAT TAT GAT C GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

24 01 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 

24 51 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2 651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2 901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 

2 951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-l>: 

1 MFWIVLIVIV LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPEFS LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDtfPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEAAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMQSESKTSP VRPVFKEITL EEATRALSSA ALRETKKRYI 

201 DAFEKNGTAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQEKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDEAA DIHIEEPAAP DAWVVEPPEV 

401 PEVAVPEIDI LPPPPVSEIY NRTYEPPAGF EQAQRSRIAE TDHLAADVLN 

4 51 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFEDVPSER 

501 PSCRVSDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKVV DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RVVETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGITHLLAPV VTDMKLAANA LNWCVNEMEK 

7 51 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEK LPFI 

801 VWVDEFADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDGET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 

ORF58ng-l and ORF58-1 show 97.2% identity in 1014 aa overlap: 

10 20 30 40 50 60 

orf 58-1 . pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
I I I I I I I I I : t I I I I I I I II I I I I I I I I I I I I I I I I I f I I I I I I I M I I I I I I I I || | : : 
orf 58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPEFS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 58-1 . pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTAS DGYSDSGNGT 
I I I t H 1 I I I I I I I I I f I t 1 I I I I I I I I I I I I I I I II I I I I I ( I I I I I I I I I I [ I I I I I I 
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20 



orf58nq-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
70 80 90 100 110 120 

130 140 150 160 1*70 180 

orf58-l pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
illld | | | ! I I I I I I M I I I I I II II 1 I I I I I M I H I I I Ml: llliliilMil 
EEAETEAAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMQSESKTSPVRPVFKEITL 
130 140 150 160 170 180 



orf 58ng-l 



;h 25 



190 200 210 220 230 240 

orf 58-1 pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 
| | 1 | 1 1 ) : M I I ! ! I I I I I I i i I i i I I 1 1 i I I I I I I 1 i M I i 1 i I ( M I I I I t I 1 I I : I 
orf58ng-l EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58-1 pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 
| i I I I I I I I I I I I I I I I 11 I I I I 1 I I I I I I I 1 I I M I ! M ): 1 I I I I I I I 1 I I I I I 1 I I 
orf58ng-l FDADKEAFSE S AD YGFE P Y FEKQH P SAFSAVKAEN ARN AP FRRHAGQEKGQAEAKS PDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58-1 . pep QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 
I | ! I I I I I II I I I I f M 1 M I II i I I I i I I i I I i I I i I I I t I I : i M t t I It I I t I I I M 
orf58ng-l QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 

310 320 330 340 350 360 



30 



370 380 390 400 410 420 

orf 58-1. pep VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 
I I I M I I I I I I I I I I II I : I I I I I I I I I I I I I 1 I I I I I I I I : I : 111 I I I I I I I I I 
orf58ng-l VFTETVSSVGYGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 

370 380 390 400 410 420 



35 



J 40 



45 



50 



430 440 450 460 470 480 

orf 58-1 . pep NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
i I M I I I : I I I I : I II I I I I I I I I I i I I I I I I I I I I I I I I 1 I I I I! I i I i I i M I I M I 
orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58-1 . pep EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 
I I I I I I I I I I I I I I : I I I I II I I I I I I I I I I I I 1 I I I I I I I I I I II I M I I I I I I I I I I 
orf58ng-l EAFGHDSQAVCPFE DVPSERPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 58-1. pep EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
I I I I I I I I I I I I I I I I I i I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M t I I I I 
orf58ng-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 



55 



60 



65 



610 620 630 640 650 660 

orf 58-1. pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I I I I I 
orf58ng-l LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 58-1. pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
I I i I I I ( I I I I I I I I I I I I i I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf58ng-l TGQPVVTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

670 680 690 700 710 720 

730 740 750 760 770 780 

orf 58-1 . pep EG I PH LLAP WT DMKLAAN ALNWC VNEME KR YRLM S FMG VRN LAG FN QK I AE AAARGEK I 
III I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I II I I 
orf 58ng-l EGITHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 



70 



790 



800 



810 



820 



830 



840 
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orf58-l pep GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
| f ( | | | | | t I t I I I I I t i ! I I I I I M ! I I I I I I I I II I I I M M I I I II I I I I ! I I M I I 
orf58ng-l GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
790 800 810 820 830 840 

850 860 870 880 890 900 

orf 58-1. pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 

iiiiitiiMiiMiiimnmiiiimimiiiiMiimiiM minim 

orf58ng-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 
850 860 870 880 890 900 

910 920 930 940 950 960 

orf 58-1. pep VHGAFASDEEVHRVVEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 
I M I M II I M I I M I M I I M I M M I II I M I I M M M M M M I M II M I M II 
orf58ng-l VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSV 

910 920 930 940 950 960 



20 



970 980 990 1000 1010 

orf 58-1. pep VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 
I I M I I I It II II I I I I M I II I M I I M M M I I II I II II M I II I I I M M I 
orf58ng-l VLKTRKAS I SGVQRALRIGYNRAARLI DQMEAEGIVSAPEHNGNRT I LVPLDNAX 

970 980 990 1000 1010 



25 



30 



35 



40 



45 



50 



55 



60 



Furthermore, ORF58ng-l shows significant homology to the E.coli protein FtsK: 

sp|P46889|FTSK_ECOLI CELL DIVISION PROTEIN FTSK >gi | 1651412 ] gnl | PID i dl015290 (Dl 
division protein FtsK [Escherichia coli] >gi 1 1651418 I gnl I PID | dl 01 52 96 (D90727) Cell 
division protein FtsK [Escherichia coli] >gi 1 1787117 (AE000191) cell division 
protein FtsK [Escherichia coli] Length = 1329 
Score = 576 bits (1469), Expect = e-163 

Identities = 301/459 (65%), Positives = 353/459 (76%) , Gaps - 5/459 (1%) 

Query: 556 IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 615 

+E +LA+F++K VV+ GPVITR+E+ GV+ + NL +DLARSL ++RVVE 
Sbjct: 868 VEARLADFRIKADVVOTSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRVVEV 927 

Query: 616 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPVVTDLGKAPHL 67 5 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PVV DL K PHL 
Sbjct: 928 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPVVADLAKMPHL 987 

Query: 67 6 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 735 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
Sbjct : 988 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 

Query: 7 36 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP — 7 93 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

Sbjct: 1048 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

Query: 7 94 --LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 851 

L+K P+IVV+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
Sbjct: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

Query: 852 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 911 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D4-EV 
Sbjct: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

Query: 912 HRVVEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 971 

H VV+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
Sbjct: 1228 HAVVQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

Query: 972 VQRALRIGYNRAARLI DQMEAEGIVSAPEHNGNRT I LVP 1010 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
Sbjct: 1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 



their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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The following partial DNA sequence was identified in N. meningitidis <SEQ ID 497>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

5 101 TGCTCGGCCG TGCCGCCGAC GGGC..GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C - 

// 

901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

10 1001 TGCTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

This corresponds to the amino acid sequence <SEQ ID 498; ORF101>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GXVIAIDAVL 

15 1 51 ALVGFWV 

// 

301 . ..IAIGLFL IYQNGLTLLF EAVEDGKIHF WLGLLPMHII MFVLALILLR 
351 VRSMPSQPFW QAVGKSLTLK GGK* 

Further work revealed the complete nucleotide sequence <SEQ ED 499>: 

20 1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 

25 251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

401 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 

451 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

30 501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC GCCAAAGAAG GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCAAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACCATT 

35 751 CCGACCGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC AGGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

40 1001 CTATGCACAT TATCATGTTT GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 

1051 AGTATGCCCA • GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

This corresponds to the amino acid sequence <SEQ ID 500; ORF101-1>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

45 51 LVGFWVIGMT PLLLV LTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

50 301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF AVALILL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from K meningitidis (strain A) 

ORF101 shows 91.2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
55 an ORF (ORFlOla) from strain A of N. meningitidis: 
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10 20 30 40 50 

orflOl pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWVX 

niMiiiiiiiiniMiMiMiimiimii m 1 1 1 i 1 1 1 u 1 1 1 1 1 

orflOla MIYQRNLIKELSFTAVGIFVVLLAVLVSTQAINLLGXAADXRX-AIDAVLALVGFWVXXM 
10 20 30 40 50 

// 

90 100 110 

orflOl pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

I I I I t I I t t I I I I t t II I I I t I t II I I 1 I I 
orflOla LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 
280 290 300 310 320 330 

120 130 140 150 

orflOl. pep LPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 
I | | | i i I I I : I : : i I t M II I I I I t I I I t I I I ! I I I I I I 1 
orflOla LPMHIIMFVIAIVLLRVRSMPSQPFWQAVGKSLTLKGGKX 
340 350 360 370 

The complete length ORF101 a nucleotide sequence <SEQ ID 501> is: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCN TGCCGCCGAC NGGCGTNTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCNN NNGNATGACG CCGCTTTTGC TNGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGNGACAGCG 

251 AAATGTCGGT CTGGNTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

401 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGGGTTCAAC 

451 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC NCCAAAGAAA GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCNAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTT CCCACCG CCGTACNATN 

751 CCNACNGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC ANGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGANTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT CATCATGTTC GTCATCGCAA TCGTACTTCT GCGCGTCCGC 

1051 AGCATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 502>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGXAAD XRXAIDAVLA 

51 LVGFWVXXMT PLLL VLTAFI STLTVLTRYW RDSEMSVWXS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGGFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF XKESNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFXKL NLIISTTPKL IDPVSHRRTX 

251 PTAQLIGSSN PQHXAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LXAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOla and ORF101-1 show 95.4% identity in 371 aa overlap: 

orflOla . pep M I YQRN L I KE L S FT AVG I FVV L L AV L VS TQA IN L LGX AADXRXA I D AV L AL VG FWVXXMT 60 

I I I I I I I I I I I I I M II I I I I II I II I I I I II I I I I III I I I I I I I I I I I I I I II 
orf 101-1 MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 60 

orflOla . pep PLLLVLTAFI STLTVLTRYWRDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 

I i I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 1 I 
orf 101-1 PLLLVLTAFI ST LTVLTRYWRDSEMSVWLSCGLALKQW I RPVMQFAVPFAVLVAVMQLWV 120 

orflOla. pep I PWAE LRS RE YAE I LKQKQE L S L VE AGG FN S LG KRNG RV Y FVE T FDTE S G I MKN L FLRE Q 180 

I II I I I I 1 I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I 
orflOl- 1 IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 
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orflOla.pep DKNGGDNIIFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 240 

I i I i I t I i I j t I : 11 ! I 1 t I I I I I I t I I I I I I I I I I I I I M I II I I I M I I I I I I I I I 
orf 101-1 DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 24 0 

5 orf 101a . pep IDPVSHRRTXPTAQLIGSSNPQHXAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

I M I I I I I I I 1 I I I I I I t I I I I I I I t I I I I I t I I I I I I I I I I I I I I I I I M I I I I I I I 
orf 101-1 IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

orflOla.pep LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 
10 I ! I I I I I I II 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 

orf 101-1 LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 360 

orflOla.pep VGKSLTLKGGK 371 
I I I I I I I I I I I 

15 orfl01-l VGKSLTLKGGK 371 

Homology with a predicted ORF from N .gonorrhoeae 

ORF101 shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N. 
20 gonorrhoeae: 

orf 101 .pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWV 57 

I I I I I I I I M I I I I I ! I I II I I ! I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I 
orf lOlng MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRV-AIDAVLALVGFWVIGM 59 

25 // 



30 



orf 101 . pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 333 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
orflOlng SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

orf 101 . pep LLPMHI IMFVLALI LLRVRSMPSQPFWQAVGKSLTLKGGK 373 

I I I I I I I I I I : I :: I I I I I M I I I I I I I I I I 
orflOlng LLPMHI IMFVIAIVLLRVRSMPSQPFWQAVG 362 



The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 
35 amino acid sequence <SEQ ID 504>: 

1 MIYORNLIK E LSFTA VGIFV V LLAVLVSTQ AINLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

40 201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN FQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VG. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 505>: 

45 1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTGTTGGT GTCCACGCAG GCGATCAACC 

101 TGCTTGGCCG CGCAGCTGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCC 

151 TTAGTCGGCT TCTGGGTCAT CGGTATGACC CCGCTTTTGC TGGTGTTGAC 

201 CGCATTCATC AGCACGCTGA CCGTATTGAC CCGCTACTGG CGCGACAGCG 

50 251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CGTTGAAACA GTGGATACGC 

301 CCCGTCATGC AGTTTGCCGT GCCGTTTGCC ATCCTGATTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTGCG CAGCCGCGAA TATGCCGAAA 

401 TTTTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAAGCCGG CGAGTTCAAT 

451 AACTTGGGCA AGCGCAACGG CAgggtttaT TtcgtcgaaA CCTTTGACAC 

55 501 CGaatccgGC ATCATGAAAA ACCTGTtcct GcGCGAACAG GACAAAAACG 

551 gcggcgacaA CATCATCTTC GCcaaaGAag gtaactTctc gctgaaggaC 

601 AACAAAcgca cgctcgaATT GCGCCACGGC TACCGTTACA GCGGcacgcC 

651 CGGacGCGCc gactaCAATC AGGTTtcctt cCAAAAacTc aacctgATta 

701 TCAGCACCAC GCCCAAacTT ATCGaccCCG TTTCCCACCG CCGCACCATT 
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751 tcgacCGCCC 

801 GATGTGGCGC 

851 CCGTGCCGCT 

901 TTGATTGCCA 

951 TTTTGAAGCC 

1001 CTATGCACAT 

1051 AGTATGCCCA 

1101 GAAAGgcgGA 



AAcTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



CAGCAGCAAT 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



CCGCAACATC 
CCTCCTGCTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



AGGCAGAATT 
TGCCTACTCG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ID 506; ORF101ng-l>: 



10 



15 



_AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNIIF 
NLIISTTPKL 
CLLAVPLSYF 



GLLPMHIIMF 



GRVAI DAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



20 



25 



30 



35 



40 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ 

51 LVGFWVIGMT PLLLV LTAFI STLTVLTRYW 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL 

251 STAQLIGSSN FQHQAELMWR I5LTVSVLLL 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL _______„ 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORF101ng-l and ORF101-1 show 97.6% identity in 371 aa overlap: 

10 20 30 40 50 60 

orf 101-1. pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 
I | 1 | i M II i I M i I I 11 ! 1 I I I t I i I I ! i M I t I I I I I ! I i I I I I i I t t II t M ( I I I I 
orfl01ng-l MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 101-1 . pep PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 
I I I I I I { i I I i I I I I I I I I I I I 1 I I II I I I I II I I I I f I I I M I I I I I I I : I : I I I I M 1 
orfl01ng-l PLLLVLTAFI STLTVLTRYWRDSEMSWLSCGLALKQWIRPVMQFAVPFAILIAVMQLWV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 101-1. pep I P W AE LR S RE Y AE I LKQKQE L S LVE AGE FN S LGKRNGRV Y FVET FDTE S G IMKNL FLRE Q 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I : I I ! I I I I I I M I I I I II I 1 I I I M I I f I I 
orfl01ng-l IPWAELRSREYAEILKQKQELSLVEAGEFNNLGKRNGRVYFVETFDTESGIMKNLFLREQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 101-1. pep DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 

II I I I I I I I I I I I I I I I I : I i I I I I I I I I I I I I I I I II I I I I i I I I I I I I M I I I I I f 1 I 
orfl01ng-l DKNGGDN I I FAKE GN FS LKDNKRT LE LRH G YR Y S GT P GRAD YN Q V S FQKLNLI I STTPKL 

190 200 210 220 230 240 



45 



250 260 270 280 290 300 

orf 101-1. pep IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
I I I 1 I I I I I I I I I I I I I I I I I I I I t t I I I I I I I I i I I I I 1 I I t I I I II II I I 1 I I t I I I 
orf 101ng-l IDPVSHRRTISTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 

250 260 270 280 290 300 



50 



55 



orf 101-1 .pep 
orf 101ng-l 

orf 101-1 .pep 
orflOlng-l 



310 320 330 340 350 360 

LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 
I I I II I 1 I II I I I I I I I II I 11 I I! I I I I I I I I II I I I I I : : I : : I I I II I I I I I I I I I I 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

VGKSLTLKGGKX 
! I I I I I 1 I I II I 
VGKSLTLKGGKX 
370 



60 Based on this analysis, including the presence of a putative leader sequence (double-underlined) 



and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

The following partial DNA sequence was identified in N .meningitidis <SEQ ID 507>: 

5 1 GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

10 251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT . GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORF1 13>: 

15 1 ..GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 

51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 
101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRRS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with with pspA putative secreted protein of Af. meningitidis (accession AF030941) 
20 ORF and pspA show 44% aa identity in 179aa overlap: 



orf 113 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT+ P G+L+ F -f G WI G GLD D DYTRILS -f+I+A 
pspa GGGLINAASVTLTSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

25 orf 113 PVWGQDVRWAGQNDVAATGDAHSPILXXXXXXXXXXXXXXGTHIPLFAIDTGKLGGMYA 120 
VWG+DV+W+G+N + G + P AIDT LGGMYA 

pspa GVWGKD VKVV S GKNKLD FDG SLAKT AS AP S S S D S VT PTVAI DT ATLGGMYA 307 

orf 113 NK I TL I S T VEQAG I RNQGQW FAS AGN VAVN AEGKL VNTGMI AATGEN HAVS LHARNVHN 17 9 
30 ' +KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

pspa DKITLI STDNGAVIRNKGRI FAATGGVTLSADGKLSNSGS IDAA EITISAQTVDN 3 62 

Homology with a predicted ORF from N. gonorrhoeae 

ORF1 13 shows 86,5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
35 overlap at the C -terminal part with a predicted ORF (ORF1 13ng) from N. gonorrhoeae: 

orf 113 GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

I I I I I I I I I I I I I : : I M I M I : t : I I I i 
orfll3ng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 224 

40 orf 113 QGNVVIAGHGLDARDTDYTRILSYHSKIDAPVWGQDVRWAGQNDVAATGDAHSPILNNA 90 

I I I : I I t M I I I I I I I I : I I I I 
orfll3ng QGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

or f 1 1 3 I DTGKLGGXVCQQNHLDQYGRASRHS 135 

45 || | | | | | M || | : . | | | | 

orf 113ng D FSG FK I RQGNAV I AG HGL D ARDT D FTR I L VCQQNHLDQ YGRT S RH S 2 63 



The complete length ORF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 



protein having amino acid sequence <SEQ ID 510>: 



CHIR-0160 (356.001) 



-317- 



PATENT 



1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 61 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 51 1>: 

1 . . TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 AT AC ATT AT C AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

451 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT . . 

This corresponds to the amino acid sequence <SEQ ID 512; ORF115>: 

1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N.meningitidis (accession number AF030941) 
ORF1 15 and pspA protein show 50% aa identity in 325aa overlap: 



OrfllS: 


1 


STGHSEQNYTLPREITRNISLGSFAYESHRKALSHHAPSQGTELPQSNGI SLPYTSNSFT 


60 






STG+S Y E++ +1 4G AY+ + + P 4 NGI +T 




pspA: 


778 


STGYSRSPYEPAPEVS-SIRMGISAYKGYAPQQASDIPGTWPWAENGIHPTFT 


831 


OrfllS: 


61 


PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 


120 






LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 




pspA: 


832 


-LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 


890 


OrfllS: 


121 


LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 


180 






L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 




pspA: 


891 


LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 


950 
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10 



OrfllS: 


181 


pspA: 


951 


OrfllS: 


240 


pspA: 


1010 


Orfll5: 


300 


pspA: 


1069 



W Jj V yr\Ej v r\Jj ruuuiyi vojv j. y/ v * v " v -•- — * «- — 

WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G IAG 



R ALI+N + N+ G + + 



A DI N G + AE LLL A 



+ R+AGIY+TG++ G 



Homology with a predicted ORF from N. gonorrhoeae 

ORF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORFllSng) from 



15 N. gonorrhoeae: 

orf 115. pep 
orfll5ng 
orf 115 .pep 
orf 115ng 
orfll5.pep 
orfllSng 
orf 115 . pep 
orfllSng 



20 



25 



30 



STGHSEQNYTLPREITRNISLGSFAYESHRK 31 
III I I II I I I : I I I I : I I I I I I I I II I 1 

NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 71 

ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 81 

I | | : || I 1 I 1 I 1 I I I I I I I II I I I I I I II I I : M II M II : I I I I M I I 

ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 131 



DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
M | | I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 



141 



191 



201 



35 



40 



orf 115 .pep 
orf 115ng 
orf 115. pep 
orfllSng 



EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 

I 1 I II I I I I I I I I I II I I I I I I I! I I II I I : I M I I I I I I I ! I I I I I I I I I I I I I I I : I I 

EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 2 61 
I I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I I 11 I I I II I I I 

VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 
I I I I I I I I II II I I : I I II I I I I I I I I I I I I : I I I : I I II : I I I I I II I M I II 1 I I I I 

SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 371 



EKGV 
I I I I 

EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 



325 
431 



orf 115. pep 
orfllSng 

An ORF1 15ng nucleotide sequence <SEQ ID 513> was predicted to encode a protein having amino 
acid sequence <SEQ ID 514>: 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAK 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



60 Further work revealed the following partial gonococcal DNA sequence <SEQ ID 5 1 5>: 
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1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

14 51 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

17 51 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 516; ORF1 15ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

4 51 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

This gonococcal protein (ORF1 15ng-l) shows 91.9% identity with ORF1 15 over 334aa: 

20 30 40 50 60 70 

orfll5ng-l .p NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 

III I I I 1 I I i : I I 1 I : I I I i I I I I I ! I I 
orfll5 STGHSEQNYTLPREITRNISLGSFAYESHRK 

10 20 30 
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80 90 100 110 120 130 

orfll5na-l p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
|||:!MMMMI!M I I I I I I I I I I I I I 1 : I I I I I I I I : I I I I I I I I 

S orf 115 ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

J oj::l±xd 40 so 6 o 70 80 

140 150 160 170 180 190 

orfll5ng-l p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 

10 I I I I t I I I I I II t II t I I I I 11 I I I I I I M I! I I I I I 1 I I I 1 I I I M I ! 1 I I I I 

orf 115 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
90 100 110 120 130 140 

200 210 220 230 240 250 

15 orf 115ng-l .p EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 

NIIIMI!IINIMIfiliill!IIII!:ilillillllllt<l<ilitltllll:l< 
orf 115 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 
150 160 170 180 190 200 

20 260 270 280 290 300 310 

orf 115ng-l.p V YVRVKNGG I DGKGALL SGSNTQINVSGS LKN S GT I AGRN AL 1 1 NT DT LDN I GGR I HAQK 
I I I I I I I I I I I M I I I I M I I I I I I I I I M 1 M M 1 I I I I I I ! 1 1 I I I I I I I I I 1 I I ! I 
orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
210 220 230 240 250 260 

25 

320 330 340 350 360 370 

orfll5ng-l.p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 
I I II I j I I I I I I I I : I I I 1 I i M I I i I I M i : I I I • I I M : i I I I I I I I M i I I I I I I I 
orf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 
30 270 280 290 300 310 320 

380 390 400 410 420 430 

orf 115ng-l . p EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 
Mil 

35 orfllS EKGV 

In addition, it shows homology with a secreted N. meningitidis protein in the database: 

gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length 
= 2273 

Score = 604 bits (1541), Expect = e-172 
40 Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LR Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 7 96 

45 

Query: 61 LGS FAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGI SLPYT PNS FTPLPGS SLYI I 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPVVAENGIHPTFT LPNSSLFAI 840 

50 Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 
55 G-fRRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 

Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299* 
DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
60 Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSVVDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+ N+ G + + A DI N G I AE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNXESRSETRSNQNEQGSVRN 1078 



65 



Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 
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10 



20 



Query: 


420 


Sb j ct : 


1139 


Query: 


480 


Sb j ct : 


1199 


Query: 


540 


Sbjct: 


1259 


Query: 


599 


Sbjct: 


1319 


Query: 


659 


Sbjct: 


1379 



FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 

SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 
+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 
EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 

SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 
SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
SN 1 1 ADNHT I L S AKNNI VLKAAETRSRSAEMNKKEKSGLMGSGGI GFTAGSKKDTQTNRS 

QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 
15 J ++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 



Q YEQKG+TVA S PV + 



Based on this analysis, it is predicted that the proteins from N. meningitidis and AT gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 62 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 517>: 

25 1 . . TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

30 251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

301 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

351 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

4 01 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

4 51 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 

35 501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG 

551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 

601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG . CTAAC 

651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

This corresponds to the amino acid sequence <SEQ ID 518; ORF1 17>: 

40 1 . . SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQVVLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

45 Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N.menimitidis (accession number AF030941) 
ORF1 17 and pspA protein show 45% aa identity in 224aa overlap: 

Orf 117 : 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 
++ +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 
50 pspA: 1173 DIRIRAAEVGSEQGRLKLAAGRDIKVEAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQ 1232 



55 



Orf 117 : 64 HETAQSSTFEGKQVVLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 

+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
pspA: 1233 NGQAVSGT LDGKEI I LVSGRDI TVTGSN 1 1 ADNHT I LS AKNN IVLKAAETRSRS AEMNKK 1292 
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Orfll7- 124 QKSGLM-SAGIGFTIGSKTNTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 

+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
pspA: 1293 EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESWGSLNGNTLISAGKHYTQTGSTISS 1352 

5 Orfll7: 183 PEGNN T I Y AQS I D I QAAHNKLN SNTTQT YEQKXLT VAFS S PVT D 226 

P+G+ 1+ IIAAN++ +Q YEQK +TVA S PV + 
pspA: 1353 PQGDVGIS SGKI S I DAAQNRYSQESKQVYEQKGVTVAISVPWN 1396 

Homology with a predicted ORF from N. gonorrhoeae 
10 ORF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORF117ng) from 
N. gonorrhoeae: 

orfll7 pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

I II I I M I I I I I : ! I : I I I I I I : ! I I : I I 
orfll7nq IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITIS 480 

15 

orf 117 . pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 

: M : : : I I I ! I I II I I I I M I I 1 I ! I I I ! M II I I I I 1 ! I I i I I i ! I f M i I I 1 i I I ! i 
orfll7ng SG IHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQS STFEGKQVVLQAGN DAN I LGS 540 

Q 20 orf 117 .pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

( I I I I I M : I ( f I I t I I I I I i ( I I i ( I I ( If i I I I I M I I I t I I I I I ( I I I I I I ( I I I M 
ri orf 117ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

O orf 117 .pep NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 210 

h! 25 I I I I I I I M I M I I t I M t : I I i I I 1 I : I I I 1 ! I I I I : I I : I ! I I : I : II I : I t I I 

sfs orf 117ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

M orf 117. pep YEQKXLT VAFS S PVT DLAQQ 230 

IS 1 1 1 I I I 1 1 I I I I I I 1 1 1 1 I 

30 orf 117ng YEQKGLTVAFSSPVTDLAQQAIAVAHKAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 7 20 

O An ORF1 17ng nucleotide sequence <SEQ ID 5 19> was predicted to encode a protein having amino 

ill acid sequence <SEQ ID 520>: 

^ 1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

=■0 51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

L :0 35 101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

40 351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

4 01 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

4 51 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SG IHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

45 601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SS PVT DLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following gonococcal partial DNA sequence <SEQ ID 52 1>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

50 51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

55 301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

4 01 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

4 51 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

60 551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 
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601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAG T CAT T AC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 522; ORF1 17ng-l>: 



1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQVVLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSOSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
shows homology with a secreted N. meningitidis protein in the database: 



gi | 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length = 
2273 

Score = 604 bits (1541), Expect = e-172 

Identities - 325/678 (47%), Positives - 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct : 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYE PAPEVS-SIR 796 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 7 97 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 



Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 
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P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 24 0 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG++4N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+ N+ G + + ADINGIAE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 107 9 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+ I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 S S GI HAGQVDDASKHT GRSGGGNKLV I T DKAQSHHET AQS ST FEGKQ WLQAGNDAN I LG 539 

+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 

Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

+4 HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

Query: 659 QTYEQKGLTVAFSSPVTD 67 6 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QVYEQKGVTVAI SVPWN 1396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae \ and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 63 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 523>: 

1 AT GAT TT AC A TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAAT CGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

401 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT , . . 

This corresponds to the amino acid sequence <SEQ ID 524; ORF1 19>: 

1 MIYIVLFLAV VLAVVAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 
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1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 

7 51 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 526; ORF1 19-1>: 

1 MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHAL PRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVWDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 11 9 -pep MIYIVLFLAVVLAVVAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
I I I I I M I I : I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I It I 1 M I i II 
orf 119a MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLtSJSKTSHVRDGKPSGGPVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 11 9. pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
I II I II I I I I I I I I I I M I I I I t I I I I t M I I I I I I I I I I I I I I I I I I I M I II I I I I 
orfll9a MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
70 80 90 100 110 120 

130 140 150 160 170 

orf 11 9 . pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 
I i I I I I I I I I Hill I M : II I I I I I I I M I I I II I I I I I (1111:11(11 
orf 11 9a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

orf 119a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

The complete length ORF1 19a nucleotide sequence <SEQ ID 527> is: 
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1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GC AC CAAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

7 01 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

751 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 528>: 



1 MIYIVLFLAA VLAVVA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVPEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

4 01 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19a and ORF1 19-1 show 98.6% identity in 428 aa overlap: 



10 20 30 40 50 60 

orf 11 9a. pep M I Y I V L F L AAVL A VV AYNM YQENQ YRKK VRDQ FG H S DK D AL LN S KT S HVR DG K P S GG P VM 
I I I I I M I I : I I I I I I I I I I I I I I It II I I t I I ! M I II I 11 M I M II I I M I ! I I II 
orf 119-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 11 9a. pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
I I M I I I I II I II I II M I II I I t It I i II I II I I I I I I I I I I I I M I I I I I I I I I I i I 
orf!19-l MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 119a. pep TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
M M M II I I I M I M I I I I : I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 119a. pep AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
I I I I II 11 II I I I II I II I I II I I I I I I I I I I I I I i I I M I 1 II 1 I M II I II I M I U I 
orf 11 9-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 11 9a. pep AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
I I I I I I I I I I : M I M I II I II I I I I I I I I I I I I I I II II I I I II I I I II M I I M I II I 
orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orfll9a pep AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
I | | | | | | | | M I M I M 1 I i I I M H Ml Ml H M I I I H I I 11 I I I I i I i M I I I Ml 
5 orf 119-1 AVTGVGFVLEDDGAFHYTDT SGSTMFSIC SLNNE P FTNALLDNQS YKGFSMLLD I PHS PA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 119a pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
10 | | I M I I M 1 M I M ! I! I I 1 M I M M I! M ! M M 1 1 M ! M M 1 I I M I M M I I 1 I 

orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 

429 

15 orf 119a. pep KTALRLFSX 

M II M M I 
orf 11 9-1 KTALRLFSX 

Homology with a predicted ORF from N. gonorrhoeae 
20 ORF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (ORF119ng) from 
N. gonorrhoeae: 

orf 119 . pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 60 

I I I I I I I I I :! I I M I I I I I II I I I ! I I I I I I I I I I I I I I I I I I II M I I I II I 1! II 
orf 119nq MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 60 

25 

orf 119. pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 120 

I I I I I I I M I I I II I M I M I I I I I II II I I I I II I I I I M I II I I I I I I I I I I M 
orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 120 

30 orf 119. pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 175 

i I M I 1 M I II Mill II I : I i I I I I I I M I I M I I i It I I I II I I : I I I I I 
orf 119ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 180 

The complete length ORF1 19ng nucleotide sequence <SEQ ID 529> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

35 51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AAC AG GC AAA AGCCTCCCCG 

40 301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

4 01 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

45 551 TGCACGCACT GCCGCGCCTT tCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

7 01 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

50 801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 C CATC CAT TT GGTTTCGCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

'901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

55 1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

60 This encodes a protein having amino acid sequence <SEQ ID 530>: 



1 MIYIVLFLAA VLAVVA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 
51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKASP 
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101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

5 301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

4 01 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19ng and ORF1 19-1 show 98.4% identity over 428 aa overlap: 

10 20 30 40 50 60 

If) orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

| | I I I t I I I : I I 1 M M I I I I I 1 I I I M I I I M I II M I I M I M I 1 1 M 11 I I ) I I 11 
orf 119-1 MIYIVLFLAVVLAVVAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 
I | ( M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I II M I 1 I I I M I 
orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

20 

130 140 150 160 170 180 

orf 119ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
I I I 1 M I M I I II I I I II I I I : I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I II I M 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
25 130 140 150 160 170 180 

190 200 210 220 230 240 

orfll9ng AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
I I II I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I 1 I I 1 I 1 I I 1 I I I I I I I I I I 1 
30 orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfll9ng AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
35 I I I II : II I II I I I I I I I I I I I I 1 II I I I I I I I I I II I I I I I I M I II I I I I I I I 1 I I I I 

orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

40 orfll9ng AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

I 1 M M I I I I I M I I! I M ! I I II II I M I I II I I II I I I ! I I I I I I I I I M 11 I II 1 I I 
orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

310 320 330 340 350 360 

45 370 380 390 400 410 420 

orfll9ng GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
I I I I I 1 I 1 I I I 1 I I I I I M I I I 1 I 1 I I I I I 1 1 I I 1 I M I M I I I I I I I M I I I I II I M I 
orf 11 9-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 



50 



429 

orf 1 1 9ng KTALRLFSX 
I I I i I I II 1 
orfll9-l KTALRLFSX 



55 Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 64 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 531 > 
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GCGCGGCACG GCACGGAAGA TTTCTTCATG AACAACAGCG AC AC . ATCAG 
GCAGATAGTC GAAAGCACCA CCGGTACGAT GAAGCTGCTG ATTTCCTCCA 
TCGCCCTGAT TTCATTGGTA GTCGGCGGCA TCGGCGTGAT GAACATCATG 
CTGGTGTCCG TTACCGAGCG CACCAAAGAA ATCGGCATAC GGATGGCAAT 
CGGCGCGCGG CGCGGCAATA TTTyGCAGCA GTTTTTGATT GAGGCGGTGT 
TAATCTGCGT CATCGGCGGT TTGGTCGGCG TGGGTTTGTC CGCCGCCGTC 
AGCCTCGTGT TCAATCATTT TGTAACCGAC TTCCCGATGG ACATTTCCGC 
CATGTCCGTC ATCGGCGCGG TCGCCTGTTC GACCGGAATC GGCATCGCGT 
TCGGCTTTAT GCCTGCCAAT AAAGCAGCCA AACTCAATCC GATAGACGCA 
TTGGCACAGG ATTGA 

This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 

. . ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 
LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 
SLVFNHFVTD FPMDISAMSV IGAVACSTGI GIAFGFMPAN KAAKLNPIDA 
LAQD* 

Further work revealed the complete nucleotide sequence <SEQ ID 533>: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCGTCGGT GGTTTCCGTC GTCGCATTGG 

101 GCAATGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC GGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 C AG GAT T AAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAACA CCGACCTGAC CGCCTCGCTT TACGGCGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGACTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGACG 

1151 CATTGGCACA GGATTGA 

This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASVVSV VALG NGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o648 of E.coli (accession number AE000189) 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 

Orf 134 : 2 R HGT E D F FMNN S DX I RQ I VE S T TG TM KX XXXXXX XXXX WGG I G VMN I M L V S VT E RT KE I 61 

RHG +DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EI 
o648: 496 RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREI 555 

Orf 134 : 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

0648: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 



1 
51 
101 
151 
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Orfl34: 122 G AVAC STGIGIAFGFM P ANKAAKLN P I DALAQD 154 

A CST GI FG++PA AA+L+P+DALA++ 
o648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 

Homology with a predicted OKF from N. meningitidis (strain A) 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) from strain A of K 
meningitidis: 

10 20 30 

orfl34 vev ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl34a GESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
210 220 230 240 250 260 

40 50 60 70 80 90 

orfl34 pep I S S I AL I S LV VGG I G VMN I ML V S V TE RT KE I G I RMAI G ARRGN I XQQFL I E AVL I C V I GG 
I | M | t II I I M II M I I I I I f I I I I I I I I I I I I M I I! I I I II I I I I I I I I II II I I I 
orfl34a I S S I AL I S L W G G I GVMN I ML V S VTE RTKE I G I RMAI G ARRGN I LQQ FL I E AV L I C V I G G 

270 '280 290 300 310 320 

100 110 120 130 140 150 

orf!34 pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I II I 
orfl34a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 



orfl34.pep LAQDX 
I I I I I 

orfl34a LAQDX 

The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCATTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC AGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAAT CAT C G C C AAAC AAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 536>; 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKLLISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 
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301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 
351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134a and ORF134-1 show 100.0% identity in 388 aa overlap: 

orfl34a pep MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

5 iiiiiiiiiiiiiiiuiiMiiiitiiiiiimiiiiiMmiiiiiiiiiiinti 

orf 134-1 MSVQAVLAHKMRSLLTMLGI I IGIAS WS VVALGNGSQKKILEDI SSIGTNTI S I FPGRG 

orf 134a pep FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
| | j M || I I M I I I II I II I I I I I I I I I I I I I I M I I I i if I it i I I I I I II I I i M I M 
10 orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 



15 



orf 13 4a pep RGLKLETGRLFDENDVKEDAQWVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 
I I I I I I I I II I I I I M I I I I I I I I II I i 1 I M I 1 I II I I II I I I I I II I I I i II II I I II 
orf 134-1 RGLKLETGRLFDENDVKE DAQWVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orf 134a pep ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
I | | I I I I I I I I I II II I 1 I I I I I I M 1 I M I I I I I I I M II i I i I I I II II I M I I I I I I 
orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 



20 ' orf 134a. pep DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

I I I I II I I I I I 1 I I II I I I II II 1 I I II II I It I I M II I I II I I I I I I I M M I I M M 
orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orf 134a . pep I G ARRGN I LQQFL I E AVL I C V I GGLVGVGLS AAVS L VFNHFVT D FPMD I SAMS V I GAVAC 
25 I I I I I I I M 1 I I I II I U I M II I I I I I I I I I I I I M II II I It I I t I I I I I I I I I M II 

orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orf 134a. pep STGIGIAFGFMPANKAAKLNPIDALAQDX 
I I 1 11 1 I M I I I I I I I t I I M I I I I I I I I 
30 orf 13 4-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 

35 orf 134. pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 

I I I I I I I I I I I I M I I I : 1 I I I I I M I I I 
orfl34ng GESHTNS ITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNS DS IRQMVESTTGTMKLL 264 

orf 134 .pep ISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 
40 I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I ) M I I I I M I M I I I 1 I I I I II I : M I 

orfl34ng ISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

orf 134 .pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 
I I I I I I M M II I I I I I I II M I I I I I I I I I I I I I I I I 1 I I I I II It I I I I II I I It I I 
45 orfl34ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 

orf 134. pep LAQD 154 
I I I I 

orfl34ng LAQD 388 

50 The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACCAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCGCTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTCGAAG ACATCAGTTC GATGGGGACG 

151 AAC AC CAT C A GCATCTTCCC CGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

55 201 CAAAATCAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC CTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACC 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGATGAGA 

4 01 ACGATGTGAA AGAAGACGCG CAAGTCGTCG TCATCGACCA AAATGTCAAA 

60 4 51 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 
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601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCGGG TTGCCGAAAA AGGGCTGGCC GAGCTGCTCA 

7 01 AAGCACGGCA CGGCACGGAA GACTTCTTTA TGAACAACAG CGACAGCATC 

751 AGGCAGATGG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

5 801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGTGTG ATGAACATTA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC ATCATCGGAG GCTTGGTCGG CGTAGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ATTTCCCGAT GGACATTTCG 

JO 1051 GCGGCATCCG TTATCGGGGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAGGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 538>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSMGT 

15 51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QWVIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 

251 RQMVESTTGT MKL LISSIAL ISLVVGGIGV MNIMLVSVTE RTKEIGIRMA 

20 301 IGARRGNILQ Q FLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF134-1 show 97.9% identity in 388 aa overlap: 

orf 134ng MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSMGTNTISIFPGRG 
( I I i M I I I I I I I I I I t I II t i I I M I I ( I I I t I 1! II I It I I ! I I I :! I I I I I M 11 M 
25 orf 134-1 MSVQAVLAHKMRSLLTMLGI 1 I GIAS WS WALGNGSQKKILEDI S S IGTNT I S I FPGRG 

orfl34ng FGDRRSGKIKTLTIDDAKIIAKQSYVASAT PMTSSGGTLT YRNTDLTASLYGVGEQYFDV 

M I I I I t : I I I I I I I I M I I I M ! II I I I M M M I I I I I I I I I I I ! I I II I I I ! I I I I I 
orf 13 4-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 



30 



orfl34ng RGLKLETGRLFDENDVKE DAQWVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

I I I) I I I I I I I i I I I II 1 I I 1 I I I II I 1 M I I I I I I I I II I ! M I II I I I M U I II 1 II 
orf 134-1 RGLKLETGRLFDENDVKE DAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 



35 orf 134ng ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 

I I I I I I I I I I I i 1 I I I I t I I I I I I I I I 1 It I I M I M I I I I f : I I I I I I : : I I I M I I I I 
orf 13 4-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

orfl34ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
40 I I I M I M M I I M I I M II I I I I I I I I I M I M M I I I I I II I I 11 I I I I I I II ! M I 

orf 134-1 DFFMNN SDS IRQIVE STTGTMKLL I S S I ALI SLWGG IGVMNIMLVSVTERTKE IG IRMA 

orfl34ng IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVTGAVAC 
I I M I I I I M I I I I I I I I II : I I I 1 I I II I I I I II I II I I I I I I I I I I I I I I 1 I I I II I 
45 orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orfl34ng S TG I G I A FG FM PAN KAAKLN P I DALAQDX 

I I II II II M I M I I I I I I I I I I I M I I I 
orf 134-1 STGIGIAFGFMPANKAAKLNPI DALAQDX 

50 ORF134ng also shows homology to an E.coli ABC transporter: 

sp| P75831 1 YBJZJSCOLI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ >gi5 
(AE000189) o648; similar to YBBA_HAEIN SW: P45247 [Escherichia coli] Length = 
648 

Score = 297 bits (753), Expect = 6e-80 
55 Identities = 162/389 (41%), Positives = 230/389 (58%), Gaps = 1/389 (0%) 

Query: 1 MSVQAVLAHKMRSLLTMLXXXXXXXXXXXXXXLGNGSQKKILEDISSMGTNTI SI FPGRG 60 

M+ +A+ A+KMR+LLTML +G+ +++ +L DI S+GTNTI ++PG+ 

Sbjct: 2 60 MAWRALAANKMRTLLTMLG I I IG I AS WS IVVVG DAAKQMVLAD I RS IGTNT I D V YPGKD 319 



60 



Query: 61 FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 120 

FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 

Sbjct: 320 FGDDDPQYQQALKYDDLIAIQKQPWVASATPAVSQNLRLRYNNVDVAASANGVSGDYFNV 379 
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Query: 


121 


Sbjct : 


380 


Query: 


180 


Sb j ct : 


4 40 


Query: 


240 


Sbjct: 


500 


Query: 


300 


Sbjct: 


560 


Query: 


360 


Sbjct: 


620 



G+ G F++ + AQWV+D N + +LF +D +G+ IL P VIGV ++ 
YGMT FS EGNT FNQEQLN GRAQ WVLD SNTRRQL FPHKADWGE VI LVGNMP ARV I GVAEE 439 

DENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGT 239 

++ FG+S VL +W PY+T+ ++ G+S NSITV++K+ ++ AE+ L LL RHG 
KQSMFGS SKVLRWLPYSTMSGRVMGQSWLNS ITVRVKEGFDSAEAEQQLTRLLSLRHGK 4 99 

E D FFMNN S DS I RQMVE ST TGTMKXXXXXXXXXXX WGGI GVMN IMLV S VTERTKE I G I RM 299 
+DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EIGIRM 
KDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLVVGGI GVMN IMLV SVTERTREIGIRM 559 

AIGARRGNILQQFLIEXXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAASVIGAVA 359 
A+GAR ++LQQFLIE F+ + + S +++ A 

AVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALLLAFL 619 

CSTGIGIAFGFMPANKAAKLNPIDALAQD 388 
CST GI FG++PA AA+L+P+DALA++ 
C S T VT GIL FGW L P ARNAARLD P V DALARE 648 

Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from N. meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 65 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 539>: 

1 . . GGGACGGGAG CGATGCTGCT GCTGTTTTAC GCGGTAACGA T.CTGCCTTT 

51 GGCCACTGGC GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 

101 TTTCCTTCCT GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 

151 CTGCTCCTTG GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 

201 CAGCGGTCAG GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 

251 CCGGCTGGGC GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 

301 GGCTGGCGCG TCGTGTTTTA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 

351 GGTTTGGGCG ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 

401 TTTATCTGTC GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 

451 ACGCGCGCCT ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 

501 TATGACCGTC GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 

551 AGCTTTTCTG GCAGGAAATA CTCGGTATGT GCATCATCAT CgTCAGCGGT 

601 ATTTTGA 

This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 



1 . . GTGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 
51 LLLGFAGVVL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

Further work revealed the complete nucleotide sequence <SEQ ID 541>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

4 01 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 
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601 CTGACCGGCT 

651 CATCGGCGTG 

701 AAGTCGGCGA 

751 TTTTCCGCTC 

801 GGAAATACTC 

851 TCCGCCCCAC 

901 TAA 



GGCACACCCT 
TCCGCGCTGA 
CAAATTCACG 
TGTCTGCCGC 
GGTATGTGCA 
TGCCTTCAAA 



GTCCTTTCCA 
TTGCCCAACT 
GTTGCCTCGC 
ATTTTTTCTG 
TCATCATCCT 
CAGCGGCTGC 



TCGGCAGTTT 
GTCGATGACG 
TTTCCTATAT 
GGCGAAGAGC 
CAGCGGTATT 
AATCCCTGTT 



ATCTGTCGTG 
CGCGCCTACA 
GACCGTCGTT 
TTTTCTGGCA 
TTGAGCAGCA 
CCGCCAAAGA 



This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 



l 

51 
101 
151 
201 
251 
301 



MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 
TVALGAAAVL RRDXFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL FLATGV 
T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 
TAA LAG LAG G AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 
LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 
FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RFTAFK QRLQSLFRQR 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) from strain A of N. 
meningitidis: 

10 20 30 

orfl35.pep GTGAMLLLFYAVT I LPLATGVTL S YT SSIF 

M I I I I I II I II I I I II I I I I II M I I I I 
orfl35a STVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIF 
50 60 70 80 90 100 

40 50 60 70 80 90 

orf 135 . pep LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
I I I I I I I I I I II I I I I I I I I I It I I II I I I I I I I I I I I II I I I I I I II I M II I II I I II 
orf 135a LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
110 120 130 140 150 160 

100 110 120 130 140 150 

orf 135 . pep VRELSLAGEFGWRVVFYLSVTGVAMSSVWATLTGWHTLSFFSAVYLSCIGVSALIAQLSM 
I i I i I I I i M I M I ! II I I M I I I I II I I I I I I i I i I I i I M I I I I I I I I I M I M II I I 
orf 135a VRELSLAGEPGWRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 

160 170 180 190 200 

orf 135. pep TRAYKVGDKFTVASLSYMT WFSALSAAFFLGEELFWQE I LGMCI 1 1 SAVFX 

I I II M I I I I I II I I II I I I I I I I I I I I I I I : I M I I I I I I I I! I I I 
orf 135a TRAYKVGDKFTVASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAF 
230 240 250 260 270 280 

orf 135a KQRLQSLFRQRX 
290 300 

The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGGATACCG 
GGCGGCCTGC 
AATTTGCCCT 
ACCGTTGCGC 
GCCCCATTGG 
TGCTGCTGCT 
ACCCTGAGTT 
TTTGAAAGAA 
TTGCCGGCGT 
ACGGCGGCAC 
TTTGAAAGTG 
TGTTTTACCT 
CTGACCGGCT 



CAAAAAAAGA 
TTTACCATTA 
CGGCAGCGGC 
TCGGGGCTGC 
AAAAACCACT 
GTTTTACGCG 
ACACCTCGTC 
CGGATTTCCG 
GGTATTGCTG 
TCGCCGGGCT 
CGCGAACTGT 
TTCCGTGACA 
GGCACACCCT 



CATTTTAGGA 
TGAACGTATT 
GAATTGGTCT 
CGCCGTATTG 
TAAACCGCAG 
GTAACGCATC 
GATTTTTTTG 
TTTACACGCA 
CTTAATCCCT 
GGCGGGCGGC 
CTTTGGCGGG 
GGTGTGGCGA 
GTCCTTTCCA 



TCGGGCTGGA 
GATTAAAGAG 
TTTGGCGCAT 
CGTCGGGACA 
TATGGTCGGG 
TGCCTTTGGC 
GCGGTATTTT 
GGCGGTGCTG 
CGTTCCGCAG 
GCGATGTCCG 
CGAACCCGGC 
TGTCATCGGT 
TCGGCAGTTT 



TGCTGGTGGC 
GCATCGGCAA 
GCTGTTTTCA 
CCTTCCGCAC 
ACGGGGGCGA 
CACCGGCGTT 
CCTTCCTGAT 
CTCCTTGGTT 
CGGTCAGGAA 
GCTGGGCGTA 
TGGCGCGTCG 
TTGGGCGACG 
ATCTGTCGTG 



CHIR-0160 (356.001) 



-335- 



PATENT 



651 CATCGGCGTG 

7 01 AAGTCGGCGA 

751 TTTTCCGCTC 

801 GGAAATACTC 

851 TCCGCCCCAC 

901 TAA 



TCCGCGCTGA 
CAAATTCACG 
TGTCTGCCGC 
GGTATGTGCA 
TGCCTTCAAA 



TTGCCCAACT 
GTTGCCTCGC 
ATTTTTTCTG 
TCATCATCCT 
CAGCGGCTGC 



GTCGATGACG 
TTTCCTATAT 
GCCGAAGAGC 
CAGCGGTATT 
AATCCCTGTT 



CGCGCCTACA 
GACCGTCGTT 
TTTTCTGGCA 
TTGAGCAGCA 
CCGCCAAAGA 



This encodes a protein having amino acid sequence <SEQ ID 544>: 



10 



i 

51 
101 
151 
201 
251 
301 



MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

TVALGAAAVL RRDTFRTPHW KNHLNR5 MVG TGAMLLLFYA VTHL PLATGV 

T LSYTSSIFL AVFSFLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

FSALSAAFFL AEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 



15 ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 



20 



25 



30 



35 



orfl35a pep MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
I I ( I I II I i I I t t I II I I t I ! I I t i I I I I II I I I t I I I I I I M M I 11 It I M I M I M I 
orf 135-1 MDT AKKD I LG SGWMLVAAAC FT IMNVL I KE ASAKFALGS GELV FWRMLFS TVALGAAAVL 

orf 135a. pep RRDT FRT PHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLS YTS S I FLAV FS FLI LKE 
I | | : ! I I I I I I I I I I I I I I I I I I I I I II I M I I I I I I ! I I I I II I I I I I M I t I I I M I I 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLS YT SSI FLAVFS FLI LKE 

orf 135a . pep RISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 
I I I I I I I M I I II I I I M I I I I I I I I I I II I I II 1 N II I I M II M I I I M I I I I I I M 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 



WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 
I I I I I I I M i I I I M I I I I I I t M I I II M I t I I I I I I I I I I I I I II I I t M I I I I ( I I I 
WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 



VASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

I I M I I I II I I I I I I I I I I I : I I M I I 1 I II M I I I ! I I I II I I I I I I II I I I M I I I I I 
VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 



40 



45 



50 



orf 135a . pep 
orfl35-l 
orf 135a . pep 
orfl35-l 

Homology with a predicted ORF from N. gonorrhoeae 

ORF135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 
TV. gonorrhoeae. 

30 
335 



orf 135 . pep 
orf 135ng 
orf 135 .pep 
orfl35ng 
orf 135 .pep 
orf 135ng 
orf 135 . pep 
orf 135ng 



GTGAMLLLFY AVTXLPLATGVT LSYT SSIF 
I I I It I I I I I I I ! I I I : I M I II I I I II I 
STVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIF 



LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 90 
I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I 
LAVFS FLI LKERI S VYTQAVLLLGFAGWLLLNPS FRSGQEPAALAGLAGGAMSGWAYLK 395 

VRELSLAGEPGWRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 150 

I I I I I I I M II II I I I I I I : II I II M 1 I I ! I I II II I I I I I I II I I II M I I I I I I I I 
VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSM 455 

TRAYKVGDKFTVASLS YMT WFSALSAAFFLGEELFWQE I LGMCI 1 1 SAVF 201 

II II II I I II I I M I I I I I I I I I If I I I I M I I 1 I I I I I I I I I M I I I I : I 
TRAYKVGDKFTVASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIISAAF 506 



55 



An ORF135ng nucleotide sequence <SEQ ED 545> was predicted to encode a protein having amino 
acid sequence <SEQ ID 546>: 

1 MPSEKAFRRH LRTASFQGLH LHHFHQKVGK CGIIGFGIHI FPTLLPAA QG 
51 ILDIQLGLFR IDFAALAVYR RTQVDFIHTV IDGIASDQAF SEWQILRRL 
101 NLGHFTDTHL IAQARRFIAD FGNIRPMRRG EAKTFCRCFR FDGIDGIHGD 
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151 FRQCGHINRL APGKDCRNGK RDKVFFHTRH YNQVCLEKTN CSARKIKFRH 

201 QKQAKTHSTS LAARFTIRPS LSQRPFMDTA KKDILGS GWM LVAAACFTVM 

251 NVLIKEASAK FALGSGELVF WRMLFSTVTL GAAAVLRRDT FRTPHWKNHL 

301 NRSMVGTGAM LLLFYAVTHL PLTTGVT LSY TSSIFLAVFS FLIL KERISV 

351 YTOA VLLLGF AGWLLLNPS F RSGQEPAAL AGLAGGAMSG WAYLKVRELS 

401 LAGEPGWRW FYLSATGVAM SSVWATLTGW HTLS FPSAVY LSGIGVSALI 

451 AQLSMTRAYK VGDKFTVAS L SYMTWFSAL SAAFFL GEE L FWQEILGMCI 

501 TlSAAF * 

Further work revealed the following gonococcal sequence <SEQ ID 547>: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTCACCGTTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTACGC TCGGTGCTGC CGCCGTATTG CGGCGCGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAAC C ACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGAC AACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTttg GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 CCGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGCAACC GGCGTGGCGA TGTCGTCggt ttgggcgacg 

601 Ctgaccggct ggCACAcccT GTCCTTTcca tcggcagttt ATCtgtCGGG 

651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

7 01 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttcctaTAt gaccgtcGTC 

7 51 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtCcT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 

1 MDTAKKDILG SGWMLVAAA C FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

101 T LSYTS5IFL AVFSFLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

2 01 LTGWHTLS FF SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VASLSYMTVV 

2 51 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 

orf 135ng-l . pep MDTAKKDILGSGWMLVAAACFTVMNVLIKEASAKFALGSGELVFWRMLFSTVTLGAAAVL 
II M I I I M I I t I ! II M I I I ! : I 1 I 1 ! I I I I I II ! II ! M I II I I II I I II : I II I I ! I 
orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

orf 135ng-l.pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 
M I : I I I II I I I II I t I I I I I I I I I I M I M I I I I I : II I I I I II I I I I I I I I I I I I M I 
orf 135-1 RRDXFRT PHWKNHLNRSMVGTGAMLLLFYAVTHL PLATGVTLS YT SSI FLAVFS FLI LKE 

orf 135ng-l . pep R I S V YT QAV L L LG FAG WL LLN P S FR S G QE P AALAG LAGG AM S GWA Y LK VRE L S LAGE PG 
! I I I I I II II I M II 1 I i I I I I I I I I I M I I I I I I I I I I I I I I I I I II ( I M M I II I I 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orfl35ng-l .pep WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMTRAYKVGDKFT 
I I M II I I : I I I I I I I I M I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135ng-l . pep V AS LSYMTVVFSAL S AAF FLGEEL FW QE I LGMC IIILSGILSSIRP I AFKQRLQAL FRQR 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II M : I I I I I 
orf 135-1 VASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 66 

The following DNA sequence was identified in N. meningitidis <SEQ ID 549>: 

1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC ACCGTAAATA TAAAGACCGT CAAAATAAAT ATCGTCGATC 

401 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

451 TTTGACCATG GCAAAAT CCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

501 AAAGcTCGCG CCAAAAATAT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA TAATAAATGA CGGAATCGCC 

601 CATCAT^TCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC kTCGGCATCC GATTCGGATT TGAAAAGTTC mmrwyATTCG 

7 01 GAATAG 

This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 

1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 

51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 

101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 

151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 

2 01 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 AT CC AC AT AT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT AC AT AAT AAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This corresponds to the amino acid sequence <SEQ ID 552; ORF136-l>: 

1 MMKRR IAVFV LFPQIIRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADVVN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYII NDGI 

201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF136 shows 71,7% identity over a 237aa overlap with an ORF (ORF136a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

orfl3 6.pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
I I I I I I M I I : I I I : I I M I t I II I I I I I I I I I I I 1 I I II I I I I I I I I I) I I I I I 1 
orfl36a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
10 20 30 40 50 60 
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60 70 80 90 100 110 119 

orfl36 pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
| | | i | I I : I I I I I : | I I I I I I I I I : I I I I I I I t I I I I I II I I I 1 I I I I M I I I I I I I 
orfl36a pcGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 
70 80 90 100 110 120 

120 130 140 150 160 170 179 

orfl36 pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

i : : I : 1 I I I I I I II I I I I M I I I I I I I I I : I : I : I : : : : 

orfl36a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
130 140 150 160 170 180 

180 190 200 210 220 230 

orfl36 pep AFVGTVYRFVCLFYI INDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSXXSEX 

: ||:| : : :: I I II t I I I I I I I I I I I II I I I I I I I I I I III 

orf 136a R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF136a nucleotide sequence <SEQ ID 553> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACGCCATAA ATGTAAAGAC CGTCAAAATA AATATCGTCG 

4 01 ATCCACATAT GTTCGCAAAT TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GCTTTGACCA TGGCAAAATC TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 

501 AAAAAGCTCG CGCCAAAAAT ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 

551 CACGGTTTAC CGGTTTGTCT GCCTGTTCTA CATAATAAAT GACGGAATCG 

601 CC CAT CAT AT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 554>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADVVN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

orf 13 6a . pep MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
I I I M I I I I I I : I ( I : I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I 
orf 13 6-1 MMKRRI AVFVLFPQI IRVLGQLLPKI VNTVPAHRMLFQI FGMFFFFIHQQYLPGIAEI DS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 136a . pep PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 
1111111:11111 : I I I I I II I I I : I II I I I I I I II I I I I I I I I I I I I I I I II II I I I 
orf 136-1 P CG I V FGAL L FRH L P AHC L YG KAAVG DAV AHEH P V A D WNRN AN A FAL FD I GQFAG F I VQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 13 6a. pep HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
I :: I : I I I I I I I I I I I I I I 1 I I I I III I I : : I : I : | : : : : 

orf 13 6-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

130 140 150 160 170 180 

190 200 210 220 230 

orf 13 6a. pep R S PARFTGLSACSTXXMTES PI I SAPQRVRYLFAPYCGFLPSASDS DLKS SKYSEX 

: M : I : : : : I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I 

orf 13 6-1 AFVGTVYRFVCLFYI INDGIAHH SAPQRVRYLFAPYCGFLPSASDS DLKSSKYSEX 
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190 200 210 220 230 

Homology with a predicted ORF from N. gonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 



MKRRI AVFVLFPQI IRVLGQLLPKIVNTVPAHRMLFQI FGMFFFFIHQQYLPGI AE I DS 59 
| i J f | M I I I : I I I : I M I i I M I I I I I I I I 1 M I I I I I II I I I I : I I I I I I M t I I 
MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 60 

PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 119 

| | I ! II : ! I ! I ! ! i 1 I! ! i M I I M M I ; M N ! i ! : i ! I I M 1 i i ( i I i i t till 
PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 120 

HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 17 9 
IIIIMtlMIIIIIIIIIIIMIIMIIMIIMIIIIIIIIIMIIIIIMHIIIM 
HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 180 

AFVGTVYRFVCLFY I INDGI AHHS APQRVRYLFAP YCGFLPS AS D S DLKS SXXSE 234 
] I : I M I i I i I I I f I I I I I I I I i : M I II II I I I I I I I I I I I I I I I I I I II 
AFAGT VYRFVCL FY I IN DG I AHHTAPQRVRYLFAPYRGFL PPAS DS DLKS S KYSE 235 

The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAAATTTTC GGGATGTTCT TTTTCTTCAT ACACCGGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

4 01 AT CC AC AT AT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

7 01 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYI I NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 
M I I I I II M I : I I I : M I I M I I I I I I I I M I II I I I I I I I I I ! I : I I I I I I I I I I 1 
orf 136-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

orf 136ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 
I 11111:111111 I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I I I I I I I I I 
orfl36-l PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

orfl36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I M II I I M I II II i I I I I I : I I I I I I 
orf 136-1 HTVN IKTVKIN I VDPHMFAN FAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKI FEC FTG 

orfl36ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 
I I : I I I I I I I I I I I I I I II I 1 I I : I I I I I I I If I I I I i I i I I I I I I I I II I I I I 
orf 136-1 AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 



N. gonorrhoeae: 

orfl36.pep 
orf 136ng 
orfl36.pep 
orf 136ng 
orfl36.pep 
orf 136ng 
orf 136. pep 
orfl36ng 
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Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC.TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC. . 

This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGNLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 559>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORF137-l>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A of N. 
meningitidis: 
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10 20 30 40 50 60 

orfl37 Deo MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 
" P Ml llll MM! Ml III Mil II MINI: II 1 M M M M M i ( M M I M I M II 

orf 137a menmvtfskirpllaiaaaallaacgtagnnaarkpvqtakpaawglalgggaskgfah 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl37 pep VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 
I M M I I M M M M 11 M M M M I I : M 1 M I M M I I ! M 11 I M I M I M I II I : I 
orfl37a VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

70 80 90 100 110 120 

130 140 149 

orf 137 .pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 
MM M II 11 I M : I M II M M I M 
orf 137a FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
130 140 150 160 170 180 

The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGC TTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCANGNNNNG NATNTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGAGCA AAAACATCAG CCAAGGCTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CCGCGTTGCA AAATGAGTTG 

7 51 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 562>: 



1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAARKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRRI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRXXX XXVIAVDISA RPSKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 



orf 137a . pep MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 
II M II I II M II II M M M M M M M II I M M M M M II M II M M I II M I M 
orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAVVGLALGGGASKGFAH 



orf 137a. pep VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 
II II I I II I M II M II I M M I I I I M II M I M II I M M I M II II M II M II I M 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 



orf 137a . pep FIKGEKLQNY INRKVGGRRI QQFP I KFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
I M I 11 M I i M M I I I I : i I M I M I M I M II i M I M i I 1 I M M I I I M I 11 M I I 
orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 



orf 137a . pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 
II II II M M M II M I M M I I II ! I I I I I I I I I I I : I I I I I I I I ! I I I I I I M 

orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 



orf 137a. pep 



MS VSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 
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MMMMMMlMMMMMIIIMIIIKiMMMMIMMIMMMIMIM 
orf 137-1 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Homology with a predicted QRF from N. gonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N. gonorrhoeae: 

orfl37 pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAVVGLALGGGASKGFAH 60 

M Mill MM ill II tIMt II I I II II: IIIIIMMIMMMIMMIIMM 
orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 60 

orf 137 pep VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120 

M I M I M M M I M M M M M M MMM M M M M M M M M I M M I M M I : I 
orfl37ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 

orf 137 . pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 14 9 

111! M M M M I : I t I M M I M M 
orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 180 

The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGATCATTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGTAC GGCGGGAAAC AATGCCGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGC TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT ATAGGAATTG TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTG GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AGATTTTAGG TAAAACCGAT TTAGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCCACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 

551 T CAT CATC GG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 

751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 564>: 

1 MENMVTFSK I RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAVVALAL 

51 GGGASKGFAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 

orfl37ng ME NM VT FS K I R S FLA I AAAALLAACGT AGNRAARK P VQT AK PAA W ALALGG G AS K G FAH 

M M I I M I M M I M M I I M M II II II I : M I M I II M M I : M I II I i M I M I 
orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orfl37ng IGIVKVLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
: I I : II M I I II M M M M I II M M M M M M M II M M M M M M M M M M I 
orf 137-1 VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
M I I I M M I I M M I I M M M M M I I M M I I I I II M I I M M II II II II M M I 
orf 137-1 FIKGEKLQNY INRKVGGRQI QQFPIKFAA VAT DFETGKAVAFNQGNAGQAVRASAAIPNV 



orf 137ng 



FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 
i i M M I II M II I I M I I M M M i I I II I II If M I I I ! : M : : I M I M M I I! 1 I 
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orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orfl37ng MSVSVLQNELGQADVVIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 
| | | | : | | | | | | | M M I I I I I M M I M I I M I I II M I I I I I 1 I I I I I I H I I M ! I I I 
orf 137 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from N, meningitidis and 
TV. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 68 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC. . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 



1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
101 MFKAVHGWEH VQQALDKHEG LLF 

Further work revealed the complete nucleotide sequence <SEQ ID 567>: 



1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

4 01 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

4 51 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC T AT AC CAT G A CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

7 01 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This corresponds to the amino acid sequence <SEQ ED 568; ORF138-l>: 



1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTS I QG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted QRF from N. meningitidis (strain A) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 
I I I M I I I I I I I I I I I M I M 1 M I I I M 1 t 1 i I I M I 11 I I I I I I 1 M I 1 1 M I I I I I 
MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
10 20 30 40 50 60 

70 80 90 100 110 120 

MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

| I 11 | | | | M M I II i II II U I I I I I i I I i I f I i I t I I I f i I I I I I M I I I t II I II I I 
MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
70 80 90 100 110 120 



orfl38.pep 
orfl38a 

orf 138 .pep 
orf 138a 



orf 138. pep LLF 
I I I 

orf 138a LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
130 140 150 160 170 180 

The complete length ORF 13 8a nucleotide sequence <SEQ ID 569> is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

4 01 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

4 51 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

7 51 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 570>: 



1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 



orf 138a. pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
LI I I I I II I I M I I M I I I I ! I I I I I I I M I M I I I I I I 11 I I I I I I I I M I I I I I I 1 I I 
orf 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 



orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
1 I I 1 I : I I I I I II I 1) I I I I I I 1 I I M I I I II I 1 I I 1 1 I II II 11 I II I I I 1 I I 1 1 I 1 ! I 
orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a . pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
t I II M M I t I I I I II I i I 1 I I I I I I t I I I II II I II I I I I I I I I I I I I I II I I M M I I 
orf 138-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orf 138a. pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGWJVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
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orfl38-l 



| | | | | i | t i I I I I I i I I I I t I M I t i 1 M t I I I I I I I I I I M I I I I I I I I II I M I I M I 
VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 



orfl38a pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

5 1 1 1 m 1 1 1 1 m 1 1 1 1 m 1 1 1 1 h n m 1 1 1 u i u m i n ii 1 1 m 1 1 1 u m 1 1 

orf 138-1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

Homology with a predicted ORF from N. gonorrhoeae 

ORF138 shows 943% identity over a 123aa overlap with a predicted ORP (ORF138ng) from 
10 N. gonorrhoeae: 

orf 13 8 pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 60 

| | i I I I i I I U M I M I I M I I I I I I I I II t I I M i I I I I II I I II I I I I I I I I I I I I 
orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 60 

15 orf 138 , pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 120 

I I I I I I I I I : I I I I I I I II I I I I I I I I M I : 1 I I I I I ! I I M I 1 I I I I M II I M II 
orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 120 

orfl38.pep LLF 123 

20 III 

orf 138ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 180 

The complete length ORF138ng nucleotide sequence <SEQ ID 571 > is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG TCGCTTTCCT 

25 101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACACG CAGACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAATGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAAAA AACCGGAAGA CATCGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAC GTGCAGCAGG CTTTGGACAA 

30 351 GGGCGAAGGG CTGCTGTTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

4 01 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCACCTGAC CGCCATGTAC 

4 51 AAGCCGCCGA AAATCAAAGC GATAGACAAA AT CAT GCAGG CGGGCAGGGT 

501 GCGCGGCAAA GGCAAAACcg cgcccaccgg catACAAGGG GTCAAACAAA 

551 tcatcaAGGC CCTGCGCGCG GGCGAGGCAA CCAtcATCCT GCCCGACCAC 

35 601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

7 01 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 

7 51 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

801 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 

40 851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

45 151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

orf 138-1. pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
50 I I I I I I I I I I I I M I I II I I I l-l I I II I I I I I I I I I I I I I M I I I I 1 I I I M I M I I I I 

orf 138ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138-1 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
II I I I I I I I : I I I I II I I I M I II I I I I I I : I I I I M I I I I I I I II I ( I 1 M I M I I 
55 orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

orf 138-1. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I N I I I M II I I M I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I II I I I I I II : I M 
orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

60 
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or f 1 3 8 - 1 pep VKQI IKALRSGEATI VLPDHVPS PQEGG E G VWV D FFGK PA Y TMT LAAKLAH VKG VKT L F F 
' P | |||Mltl:|t I 11:111 It! I I DM MhlMMIMIINIIIHMIMMMI 
o r f 1 3 8ng VKQI IKALRAGEAT 1 1 L PDH V P S PQEGG - G VWAD F FGKP AYTMT LAAKLAH VKG VKT L F F 

5 orf 138-1 pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

1 | M I I I U I t M i I M 1 I I I I : I t I I II I i I f I : I I I I t I I I 11 I I I ! II I I I I 
orfl38ng CCERLPDGQGFVLHIRPVQGELNGNKAHDAAVFNRNTEYWIRRFPTQYLFMYNRYKTP 

In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescens: 

gnl | PID | e334283 (Y14568) htrB [Pseudomonas fluorescein] Length = 253 
10 Score - 80.8 bits (196), Expect = 9e-15 

Identities = 49/151 (32%), Positives = 79/151 (51%), Gaps = 6/151 (3%) 

Query: 101 MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 159 
+ + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 
15 Sbjct: 94 LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI IFYRPPKLKAVD 150 

Query: 160 KIMQAGRVRGKGKTAPTGIQGVKQI IKALRAGEAT 1 1 LPDHV PS PQEGGGVWADFFGKPA 219 

++++ RV+ K A + +G+ +IK +R G I D P P E G++ FF A 
Sbjct: 151 ELLRKQRVQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD — PEPAESAGIFVPFFATQA 208 



20 



Query: 220 YTMT LAAK LAH VKGVKT L F FCCE RL P DGQG F 250 

T + +F RLPDG G+ 

Sbjct: 209 LTSKFVPNMLAGGKAVGVFLHALRLPDGSGY 239 



:if Based on this analysis, including the presence of a putative transmembrane domain in the 

y 25 gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
u their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF 13 8-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
,p The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14 A 

q shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 

% 30 was used to immunise mice, whose sera were used for ELIS A (positive result) and FACS analysis 

(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 

is a useful immunogen. 

Example 69 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 573>: 

35 1 ..GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 

151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 

201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

40 251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 

351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 

4 01 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 

4 51 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

45 501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG. . 

This corresponds to the amino acid sequence <SEQ ID 574; ORF139>: 



1 . . AfVSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAAPARRSAW 
51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 
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101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 
151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMVL. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 

1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

7 01 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

7 51 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

1401 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

14 51 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 

1 MDGRRWWWG AFALLPSAFL AVMVVAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAVA SV LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFA A AVLSVCCLFP hhAIVV KAWS 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA AA RRSAWMRG 

351 LMF LPFMVSP VCVSAGVLLL YPQWTAS LPL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAJ WLTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A of TV. 
meningitidis: 

10 20 30 

orf 139 . pep AW SAGE SWRVLME SET WHAVWNT LR FS AAA 

I I I I I I I I I I I i I I II I : I I I It HUM 
orf 139a QSVGEYVLLAF AAAVXSVCCLFXLLAIVV KAVJSAGESMRVLMESETWQAVWNTXRFSAAA 
270 280 290 300 310 320 



40 50 60 70 80 90 

orf 13 9 . pep VYAAAVLGWYAAP ARRSAWMRGLMF XPFMVSPVCVSAGVLLL YPQWTAS LPLLLAMYAL 
I I I M I M I I I I I II I I I I I I I I I I I I I 1 I I I f I I I I I I I I I I I f I I I M I II I i I I 
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VYAAAVL GVV YAAAA RRSAWMRGLMF LP FMVS PVCV S AG VLLL XPQWT AS L P LLLAMYAL 
330 340" 350 360 370 380 

100 110 120 130 140 150 

LAY P FVAKDVL S AW DAL P P D YGRAAAGLGANG FQT ACRI T FPLLKPALRRG LT LAAAT C V 
| | | | { ( | I | j | 1 | | t i I 1 I I t I 1 I I I I I I 1 M I I I I I I I I M II t I I I I I I I I I I I I I I 
LAYFFVA KDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
390 400 410 420 430 440 

160 170 180 189 

GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 



450 460 470 480 490 500 

The complete length ORF1 39a nucleotide sequence <SEQ ID 577> is: 

1 ATGGATGGAC GGCGTTGGGC GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGCAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGCCTGTN 

351 GTGGCGCGGC TGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTTNACCT TCCTGTGTTG GTCAGGGCGG CATATCAGGG GTTTGTGCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACNG ACATTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTNGTGTGGC 

7 01 TGGTGTNGGG GGTAACNGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCNGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTNCTGGC GTTTGCGGCG GCGGTGTNGT 

851 CTGTGTGCTG CCTGTTTCNT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT NTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT NAT C C GC AG T GGACGGCTTC GTTGCCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGNGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCNTGTC GCGTCNCGAG TGGCAGACGC TGACGACTTT 

14 01 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 

14 51 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 

1 MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FGA DGLXWRG WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 

151 VPAARLQTAX TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAVA SV LVWLVXGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFA A AVXSVCCLFX LLAIW KAWS 

301 AGESWRVLME SETWQAVWNT XRFS AAAVYA AAVLGVVYAA A ARRSAWMRG 

351 LMF LPFMVSP VCVSAGVLLL XPQWTAS LPL LLAMY ALLAY P FVA KDVL S A 

401 XDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAAT CVGEF 

4 51 AATLFXSRXE WQTLTTLIYA YXGRAGXDNY ARA MVLTLLL AAFALGXFLL 

501 LDGGEGGKRT ETL* 

ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 

orf 13 9a . pep MDGRRWAVWGAFALLPSAFLAAMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
I I I I M : I I I I I I I t I t I I I I : I I I I I I I I I I I I I I I I I i | | I | | | | | || M | 1 j ( | | t I 
orf 139-1 MDGRRWVVWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 



orfl39a 

orf 139. pep 
orfl39a 

orf 139 . pep 
orf!39a 
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orf 13 9a oeP ATCVLVLPLGVPVAWVIARLAFPGRALVLRLLMLPFVMPTLVAGVGVIALFGADGLXWRG 
P P iMiMiMHIItmilMUMMIMIMMIIIIMIMMIMIIIIII! Ill 
orf 139-1 AT CVLVLPLGV PV AWVLARLAFP GRALVLRLLML P EVM PT LVAG VGVLAL FGADGLLWRG 

S orf 139a pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 

' p p Tim in i imi 1 1 ii 1 1 in i ii linn ii i mi iMiimiiiiiiMMM 

orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orf 139a pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 
10 M Ml Mil M Ml INI III I Mill III 111 Mill lllll I MM INI II I MM 

orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orf 139a pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIVVKAWS 
M M I M Ml 1 M M M 1 M M M M Ml Ml II M M M M M M II I M I M M M 
15 orf 13 9-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orf 13 9a pep AGE S WRVLME SETWQAVWN TXR FS AAAV YAAAVLGVVY AAAARRS AWMRGLMFLP FMV S P 

II M M M I M II M II M I M I M M M Ml M M Ml M M M M M II I M M Ml 
AGE S WRVLME SETWQAVWNTLRFSAAAVYAAAVLGWY AAAARRS AWMRGLMFLP FMV S P 



20 



orfl39-l 



orf 13 9a. pep VCVSAGVLLLXPQWTASLPLLLAMYALLAYPFVAKDVLSAXDALPPDYGRAAAGLGANGF 
M M M M M M I II M II II M M M M II II II Ml I I II I II II II M II I Ml I 
orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

25 orf 139a - pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNY 

II II II II II II II M I I I I III I I M M M II M I I I II I J II II Ml II II II I 
|ij orf 13 9-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 139a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 
^30 II M M M M M M M M M M M M M M M I 

m orf 139-1 ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETLX 

m Homology with a predicted ORF from N. gonorrhoeae 

!L ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 

«C 35 N. gonorrhoeae: 

S orf 139. pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

™ M I M M M M M M M II II I I M M M 

q jJ orfl39ng QSVGEYVLLAFSVAVLSVCCLFPLSAIVVKAW SAGE SRRVLMESETWQAVWNTLRFS AAA 327 

40 orf 139 .pep V Y AAAV LG W Y AAP ARR S AWMRGLMFX P FMV S P VC V S AG V LLLYPQWTASLPLL L AM Y AL 90 

I : II II M I II I I Ml : II Ml : I I M M II M II II I II M II M II M M M II 
orfl39ng VFAAAVLGVVYAAAARRLVWMRGLVFLPFMVSPVCVSAGVLLLYPGWTASLPLLLAMYAL 387 

orf 13 9. pep LAY P FV AK D VL SAW DAL P P D YGRAAAG LG AN G FQT AC R ITFPLLKPAL RRG L T L AAAT C V 150 
45 I II I M I II II I! M M II II I M II I M M M M M II I M I M II M M M M M 1 M 

orfl3 9ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 447 

orf 139. pep GE FAAT L FL S R PE WQT LT TL I YAYLGRAGE DN YARAMVL 189 
I II II II M II M II M M I M M II I M M II I I M M 
50 orfl39ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVLTLLLSAFAVCIFLLLDNGEGG 507 

The complete length ORF139ng nucleotide sequence <SEQ ID 579> is predicted to encode a 
protein having amino acid sequence <SEQ ID 580>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

55 101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRFF WDIEMP VLRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWN T LRFSAAAVFA AAVLGWYAA AARRLVWMRG 

60 351 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LL AMY ALLAY PFVAKDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 
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Further work revealed a variant gonococcal DNA sequence <SEQ ID 58 1>: 

1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 

1051 CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 

1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCCTGTC GCGTCCGGAA TGGCAGACGT TGACGACTTT 

1401 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 

14 51 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 

1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 

1 MDGRCWAVRG AFSLLPSAFL AVMVVAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLA RL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRtfF WDIEMPVLRP WLAGG VCLVF LYCF5GFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA A ARRLVWMRG 

351 LVF LPFMVSP VCVSAGVLLL YPGWTASL PL LL AMY ALL AY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

4 51 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAM VLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

orf 139ng MDGRCWAVRGAFSLLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
1(11 1:1 It I : i I I M I I I I I I t I I I I I I t M I I I I I I II I I I I I I M I 1 I I I I M I I 
orf 1 3 9-1 MDGRRWVVWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

orf 1 3 9ng ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 
I I I I ) I I I I I ! I I I I I I I I I II I 1 I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I 
orf 13 9-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orfl39ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 
I I I I I I li II II I I I I I I I I I I i I I I II : I I I I I I I I I I I I I i i I I 1 I I I I M M I I i i I 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orfl39ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 
I I I I I I I I M I II I I I I I II I I I I M I I I I I I I I I I I I I I II I I I I 11:1111111111 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orf 139ng AAGLLYAWFGRRAVSDKAVSPVNPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 
I ! I M I I I I I I I I M I I I I I 1 I I I I I II I I I I I I I I I | : : | | | | | | | | M I I I I I I I I I 
orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIVVKAWS 
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orfl39na AG E S RR VLME SET W QAVWN T LRF S AAAV F AAAVLG W Y AAAARRL VWMRG L V FL P FMV S P 

° g MM MMM III || (til III I II II: til I II I It I II I II =11111 = in mil 

orfl39 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGVVYAAAARRSAWMRGLMFLPFMVSP 

orfl3 9na VC V S AG VL LL Y PG WT A S L PL L LAM Y AL LAY P FVAK D VL S AW DAL P P D YGRAAAG LG AN G F 

9 llllllimil IIIIIIIIMIMIIIIMIIIIIIIIMMIMMIIllMmil 

orf 139-1 vCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orfl39nq QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

m m it 1 1 1 1 1 it i m 1 1 1 1 1 1 1 1 m 1 1 1 1 1 iii 1 1 1 1 1 1 1 1 1 1 1 m in m 1 1 m I 

orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDMY 

or f 1 3 9ng ARAMVLTLLLSAFAVCI FLLLDNGEGGKRTETL 

I M I I I I I I I : I I I : I I M I I : I I M I : I I M 
or f 1 3 9 - 1 ARAMVLT LLLAAFALG I FLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 70 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ID 584; ORF140>: 



1 MDGWTQTLSA QTLLGISAAA IILILILIVR FRIHALLTLV IVSLLTALAT 
51 GLPTGSIVKD ILVKNFGGTL GGVALLVGLG AMLERLV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 585>: 



1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

7 01 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 
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1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGTCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1>: 

1 MPGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND ILVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFFIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

251 IFLNTGVSAL I SEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

4 01 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

4 51 FALSALLFAI V * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmeningitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF140a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 14 0 . pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATG LPTGSIVKD 
I I |l t I U I I M I I II I I t I I I I I I I t I I : I I I i I I I I f t I II I M I It I I I I II I M : 1 
orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 

10 20 30 40 50 60 

70 80 
orf 1 4 0 . pep I LVKN FGGT L GGVALLVGLGAMLERLV 
: I I I M I M M II I M I I ( I ( I I Ml 
orf 14 0a VLVKNFGGTL GGVALLVGLGAMLGRLV ETSGGAQSLADALIRMFGEKRAP FALGVASLIF 

70 80 90 100 110 120 

The complete length ORF140a nucleotide sequence <SEQ ID 587> is: 



1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

7 01 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 
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This encodes a protein having amino acid sequence <SEQ ID 588>: 

1 MPGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

5 !5i FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKE PAK AGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

10 4 01 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

ORF140a and ORF 140-1 show 99.8% identity over a 461aa overlap: 

orf 140-1 . pep MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 
I I I I 1 I I I I I I I I I I I I t i i I I 11 I I ! I I I I I I I I I I I 1 I 1 1 1 I I I I 1 M 1 I I I M I ! I I 
15 orf 14 0a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 



20 



40 



orf 140-1 .pep I L VKN FG GT LGG V AL L VG LG AM L G RL VE T S GG AQ S L AD AL I RM FG E KRA P FALG VA S L I F 120 

: I I I I I I I t I t I I I I ( i I I i I I I I I I I M I I I I I I I I I 1 I I I I M I I I I I t I I ! I 1 I I I I 
orf 140a VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 120 

orf 140-1 .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 

I I I I I I I t I! I i ! I I I I M I I I I I I I I I II I i I I I I I I I i I I I M I I t I I t I I I I t I M I 
orf 14 0a GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 810 



25 orf 140-1. pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 24 0 

I ! I I I I I II 1 I I I I I I I t I I I I I I I I I II I I I I I I I I II I I M I! M I I I ! I M I II i I I 

orf 140a ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKE PAKAGTV 240 

orf 140-1. pep VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 300 
30 I I I I I I I I I I I I II I I I I I II I I I I I t I I I I II I t I I t I I I I I I I I I I I II I I I I II I I I 

orf 140a VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 300 

orf 1 4 0-1. pep RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 3 60 
I i I I I I M I i I I If I I II I I M I I ( I i I i I I I I I i I i I I I II I I I I I I 1 I I I I I I I 11 M 

35 orf 140a RGE SG S ALEKTVDGALAPVCS V I LI TGAGGMFGGVLRASG IGKALADSMADLG I PVL LGC 3 60 

orf 140-1 .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I II I I I I I I I I I I I 

or f 1 4 0a FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 



orf 14 0-1 .pep FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 4 61 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 14 0a FWLVGRLL DM DV PT T LKTWT VN QT L I AL I G FAL S ALL FA I V 4 61 



45 Homology with a predicted ORF from N. gonorrhoeae 

ORF 140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) from 
N. gonorrhoeae: 

orf 140. pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 
III II I I I I I I I I I II I I I I I I I I I I I I : I I I : I I I I I I I : I I I I I I I I I I I I I I I I : I 
50 orfl4 0ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

orf 140 . pep ILVKNFGGTLGGVALLVGLGAMLERLV 87 

: ! I I I I II I I I I I I I I I I I ! I I I III 
orfl4 0ng VLVKN FGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 120 

55 The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 



1 MDGRTOTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 
51 gLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 
101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 
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151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA OGSAT VALTT AAALMAPAVA AA GFTDWQLA 

4 01 CIVLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 591>: 

1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAGCGACCCG CCGAAAGAAC 

7 01 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 

1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

4 01 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIAFIG 

4 51 FALSALLFAI V * 

ORF140ng-l and ORF140-1 show 96.3% identity over 461aa overlap: 

orf 140ng-l .pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 
Ml I I i I ( M I I I I I I I I I I I i I I I I i I i I I I : I I I I I I I : I I i I I I I I II I I II I I I I 
orf 14 0-1 MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 

orf 1 40ng-l . pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 
: t I M I t I I I I I I I I t I I I ! I i I I I I I I I I I I I I I I I I I I I I I I It I M I I I 1(1(111 
orf 14 0-1 ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

orfl4 0ng-l,pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASVGAFSVMHVFLPPHPGPIAASEFYG 
I I I I II I I I I ( I I I I I I I I I I I I I I ( I I I ( I i M I : I I I I I I I I I ( I II I I It I II II I I 
orf 14 0-1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

orf 14 0ng-l .pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 
M I I I I I II II I I I I I I I I I I I I II I I II I I : I I II I I I I II I I I I i : ( M I I ( I I I I I 
orf 14 0-1 ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 
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orfl40na-l pep VAVMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKNIGSTPVALLISVLAALLVLGRK 
9 | | : | | I | | M 1 | | | I I I I 1 I I I I I I I ! I I I M I I I I I : I I i I I : I M I I 1 I ■ I I ' i I I i f 

orf 140-1 VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 

5 orfl40na-l pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

M i M i : I M t t I M I M : M M M I I I 1 t I M I I I M I 11 I M I M 1 11 i M I I I M I I 
orf 140-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

10 orf 140nq-l .pep FL VAL ALR I AQG S AT V ALT T AAALMAP AVAAAG FT DWQL AC I VLAT AAG SVGCSHFNDSG 

MliMMMlMMMllllNIMIIiiiilitliillttMllttlMilllllMI 
orf 14 0-1 FLVALALRI AQG S AT VALTT AAALMAP AVAAAG FT DWQ LAC I VLAT AAG SVGCSHFNDSG 

orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 
15 I I t I I t I I I I I I I I M t I I I i I I I II I : I I I i 1 I I I I M I I 

orf 14 0-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

Furthermore, ORF140ng-l is homologous to an Exoli protein: 

gi 1 882633 (U29579) ORF_o454 [Escherichia coli] >gi 1 1789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 aa 
20 protein GNTP_BACLI SW: P46832 [Escherichia coli] Length = 454 

Score = 210 bits (529), Expect = le-53 

Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps - 19/384 (4%) 

juery: 88 ETSGGAQSLADALIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 
25 E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI++ A+ K 



30 



35 



40 



Query: 


88 


Sbjct: 


80 


Query: 


148 


Sbjct: 


140 


Query: 


208 


Sbjct: 


199 


Query: 


258 


Sbjct: 


256 


Query: 


318 


Sbjct: 


313 


Query: 


378 


Sbjct: 


371 


Query: 


438 


Sbjct: 


431 



L F L G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY 



E+L G T+ SD P A V ++++IP+ +1 



+S L+ + T ++IGS +RG S + AL 



A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 



45 G Q + LA G +G SH NDSGFW+V + L + V LK 

VAI LTTGGLLSEAVMGLN PIQCVLV 

TWTVNQTLIAFIGFALSALLFAIV 
TWTV T++ F GF ++ ++A++ 

50 

Based on this analysis, including the identification of the presence of a putative leader sequence 
(double-underlined) and several putative transmembrane domains (single-underlined) in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae , and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

55 Example 71 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 593>: 

1 . . GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 
51 TTTGCTGTCG CCGTGGGCTG CCGACTCATA CGATGTCGCA CGCTTTGCAG 
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GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 

AACTTTTTGG GCAGACACCA CGGGCGCAC . GTCGTCCTGA TTCTCATCGG 

CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 

CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 

CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 

GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 

TACTGATGTT TTTCCGTCCG . . 

This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 

1 . .DFGISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAAS FLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

4 01 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

4 51 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ID 596; ORF141-l>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLM5L AAA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 

251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAV NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPVV RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENI* 



101 
151 
201 
251 
301 
351 
401 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF141 shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) from strain A of N. 
meningitidis: 

10 20 30 

5 orfl41 pep DFGI S PVYLWVAAAFKHLLS PWAADS YDVA 

I M I ! I 1 I I ! I II I ! I II I II I t I I i : I 
orfl41a WNPDEPAVYTAVEALAGSPTPLVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAA 
40 50 60 70 80 90 

10 40 50 60 70 80 90 

orfl41 pep R FAGVFFAVIGLTSCGFA GFNFLGRHHGRX WLILIGCIGLIPVAHF LNPAAAAFAAAGL 
I | | 1 | I I I I : I I I I I I I I I I I ! I I I I ! I I I I i i I I i ! I I I I i : : i 1 ! i I I i I I I ! I I I 1 
orfl41a R FAGVFFAWGLTSCGFA GFNFLGRHHGRS WLILIGCIGLIPTVHF LNPAAAAFAAAGL 
100 110 120 130 140 150 

15 

100 110 120 130 140 

orf 14 1 . pep VLHGYSLARRR VIAASFLLGTGWTLMSLA AA YPAAFALMLFLPVLMFF RP 
I I I i t I I I t I I I t I II I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I 
orf!41a VLHGYSLARRR VIAASFLLGTGWTLMSL AA AYPAAFALMLPLPVLMFF RPWQSRRL MLTA 
20 160 170 180 190 200 210 

orf!41a VASLAFALPLMTV YPLLLAKTQPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWF 
220 230 240 250 260 270 

The complete length ORF1 41 a nucleotide sequence <SEQ ID 597> is: 

25 1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGTTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

30 251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

451 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

35 501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

40 751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

45 1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

50 1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 

1451 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

55 1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 598>: 

60 1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 
101 FAGVFFAWG LTSCGFAGFN FLGRHHGRSV VLILIGCIGL IPTVHFLNPA 
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151 AAAFAAAGLV LHGYSLARRR VIAAS FLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFFRPW OSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAVN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

5 351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIDIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKTG 

551 ENILKTTD* 

10 ORF141a and ORF141-1 show 98.2% identity in 553 aa overlap: 

orf 14 la . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNFDEPAVYTAVEALAGSPTP 
I 1 I I I I i I I i I I I I I M i i I i i I I I I I I M I I I It I I I I I I I I I I I I I I t I I I I II I I I I 
orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDE PAVYTAVEALAGSPTP 

15 orfl41a pep L VAHL FGQ I DFG I PPV YLWVAAAFKHLLS PWAADPY DAARFAGVFFAWGLT S CG FAG FN 

I I I II I I I I I i I II I I I I I II I I I I I I I I II I I I I! I I I I I ! I I I 1 : I I I I I I I I I I I 
orf 141-1 L VAH L FGQT D FG I P P V YLW V AAAFKH LL S P W AAD S Y D AAR FAG V F F AV I G LT S CG FAG FN 

orf 14 la. pep FLGRHHGRSVVLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAAS FLLGT 
20 I M t I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I 

orfl41-l FLGRHHGR SWL ILIGCIGLI PVAHFLN PAAAAFAAAGLVLHGYS LARRRV I AAS FLLGT 

orf 141a . pep GWT LM S LAAAY P AAFALML P L P V LM F FRPWQ S RRLMLT AV AS LAFAL P LMT V Y P L L L AKT 
I I ! j I M I I I I I II ! I I I I I I I I i I I I I I I i I I I I I I I I II I I I I I I I I I I I I I I I I II I 
25 orf 14 1-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 14 la . pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
I I I I I I I I I I I I I M I I I II I : I I I I I I I 1 I I I I I I I II I I I I M II I I I I I I II I I I I 
orfl41-l Q PAL FAQWL D YHV FGT FGG VRHVQT AF S L FY YLKNLLW FAL PAL P LAVWT VCRT RL FS T D 

30 

orf 14 la . pep WGILGVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
I I I I I I I I I I 1 I I I I II I I I I I I I I II I I I I I ! I II I I I I I I I I I I I I I I I I I I I M I I I 
orf 141-1 WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

35 orf 141a. pep FGLFAVFLWTGFFAMN YGWPAKLAERAAYFS PYYVPDI DPI PMAVAVL FT PLWLWAITRK 

I I I I II I I I I I I II II i I I I I I I I I I I I I I M M M I I I I I i I I I I I I I I I I II I I I I I I 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPI PMAVAVL FTPLWLWAITRK 

orf 1 4 la . pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPVVRSMEASLSPELKRELSDGIE 
40 I I I 1 1 I I I I I I I I I I M I II I I I I I I I I II I I I I I I I II I I II I 1 I I I I I I I I I 1 I I I I I 

orf 141-1 N I RGRQAVTNWAAG VT LT WAL LMTL FL PWL D AAKS HAP WRSME AS L S PE LKREL S DG I E 

orf 14 la. pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
45 orf 141-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

or f 14 la . pep SKFALIRKTGENI 
I I I II I I I I II I 



50 



orf 14 1-1 SKFALIRKIGENI 

Homology with a predicted ORF from N. gonorrhoeae 

ORF141 shows 95% identity over a 140aa overlap with a predicted ORF (ORF141ng) from 
N. gonorrhoeae'. 

orf 14 1 . pep D FG I S P V Y L W V AAA FKH L L S PW AA D S Y D V A 30 

55 I I I I I I I I I I I I I I I I I M I I I I I I : i 

orf 1 4 lng WNPAEPAVYTAVEALAGSPTPLVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAAHPYDAA 12 6 

orf 141 .pep RFAGVFFAVIGLTSCGFAGFNFLGRHHGRXWLILIGCIGLIPVAHFLNPAAAAFAAAGL 90 
^ M I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I 1 M I I II : II I I I I I M I I I 

60 orf 14 lng RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSWLIHIGCIGLIPVAHFFNPAAAAFAAAGL 186 

orf 141 -pep VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 140 

I I I M I I M i II I II I I I I I I I I II M I I I I I I I I I I I I I I I I ] I I I I I I 
orf 14 lng VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 24 6 
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An ORF141ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 
acid sequence <SEQ ID 600>: 

1 MPSEAVSARP LCEYLLHLAI RPFLLTLMLT YTPPDARPPA KTHEKP WLLL 

51 LMAFAWLWPG VF5 HDLWMPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 

101 PPVYLWVAAA FKHLLSPWAA HPYDAAR FAG VFFAVIGLTS CGFA GFNFLG 

151 RHHGRS VVLI HIGCIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 

201 AS FLLGTGWT LMSL AA AYPA AFALMLPLPV LMFF RPWQSR RL MLTAVASL 

251 AFALPLMTV Y PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAF NPQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

401 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 

4 51 GRQAVTN WAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 601>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 

251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 

451 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 

551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; ORF141ng-l>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLIHIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFFRPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 W GILGIVWML AVLVLLAF NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFG IMA FGLFAVFLWT GFFA MNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

4 51 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENILKTTD* 
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ORF141ng-l and ORF141-1 show 97.5% identity in 553 aa overlap: 

orfl41ng-l pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWN PAEPAVYTAVEALAGSPTP 

I I I 1 1 1 I I I I I 1 1 1 1 M M I N 1 i I I I I I 1 ! M M I 1 1 I I M I I I I I I I I I II I ! I I I I 
orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

5 

orf 141ng-l . pep L VAH LFGQTDFGIP P V Y LW V AAA FKH LLS P W AAD P Y D AARF AGV F F AV I G LT S CG FAG FN 
M II I M I I I II I i i II I i f I I t I II I I I I I I i I II II I I I I M I II 11 I I II I I I I I i 
orf 14 1-1 L VAHL FGQT D FG I P PV YL W VAAAFKHL L S PWAAD S YDAAR FAGV F FAV I GLT S CG FAG FN 

10 orf 141ng-l .pep FLGRHHGRSWLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

II I I II I I I I I I I I I M I I I I I II I I I M I I ! I I M I I I I II I II I I M I I I ! II I I I I ' 
orf 14 1-1 FLGRHHGRSVVLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

orf 14 lng-1 . pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
15 M II II I I I I M I I II I I M I M I 1 I 1 I I I I I I I I I I I I M I I I I I I I I I II I 1 M 1 II I 

orf 141-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 14 lng-1. pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 
I I I I I I II I : II 1 I I I 1 I I I I I : I I 1 I M : I I ) II I I I I I : I I I I II I I I M I I I II 1 
20 orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 14 lng-1. pep WGILGIVWMLAVLVLLAFNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
I I I M : M II M i I I M I I I I I I I I I t I I I I I I I I M I II I I II I M I I I It I I M II I 
orf 14 1-1 WG I LG VVWML AVL V LL AVN PQR FQDN L VW L L P P L AL FG AAQ L D S L RRG AAAFVN W FG IMA 

25 

orf 141ng-l . pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

I I I I I I I II I M I II I II I I M I I I I I I I I I I I I I I II I I I I I I I I I I M I I I I I I II I I 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

30 orf 141ng-l .pep N I RGRQAVTN WAAG VT LT WAL LMT L FL PWL DAAK S HAP WR S ME AS FS PE LKRE L S DG I E 

II 1 I I 1 I I I I I I I I I I I I I I I II 1 I I I II I I I I I I I I II I 1 I I I I I : I I I I I M I I I I I I 
orfl41-l N I RGRQAVTN WAAG VT LT WAL LMT L FL P WL D AAKS HAP WRS ME AS L S PE LKRE L S D G I E 

orf 14 lng-1 . pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
35 I I I I I I I I I I I I I I I I I I I I I M I I I I I : II I I I I II I i I I I I I M I I I M M I I I M I 

orf 14 1-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orfl41ng-l .pep SKFALIRKIGENILKTTDX 

I I M I I II I I 11 I 
40 orfl41-l SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae „ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 72 

45 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 603>: 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC ATTGAAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

50 This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 

1 . . 0SAKWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
51 SGFQVGYTF* 

Further work revealed the complete nucleotide sequence <SEQ ID 605>: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 
55 51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 
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101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CAGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAT 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

4 51 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG AT AT AT TT AC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 



1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.gonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
N.gonorrhoeae'. 

orf 142 . pep QSAKWLSGQTLVGTAIGIRGQIKLGGNLHY 30 

I I I ! ! 11 I I I I : I I I II I II I II M 1 I I I I 
orfl42ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 



orf 14 2 . pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

i I I I I I I I I I M : I I : : II : : I I I I I I : I 
or f 1 4 2ng DIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 



1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

4 51 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

7 01 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

5 101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

10 The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 



20 



ORF142ng and ORF 142-1 show 95.6% identity over 342aa overlap: 

orf 142-1. pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
I I I I j| I M I I I t I I I I I t I f I I : M i t I I I M I I I I I t I I I I I t : I I I t I I I I I I I 1 1 I 
15 orfl42ng-l MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 

orf 142-1 . pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
I ! I | M 1 || ! I I t I I I i i M I I I I ! I I I I I i f I 1 i I i I I 1 I i I M I 1 f I I II I M I i I I : 
orfl4 2ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 

orf 142-1 . pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 
— I I I I I i I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I II I I I I I I! I I : I I I I t I I I I I 

Ul orf 142ng-l VKLWTRETKS Y I DDAELTVQRRKTTGWLAELSHKGY I GRSTADFKLKYKHGTGMKDALRA 

25 orf 142-1 .pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

^ I I I I I I I M I I I I I I ! I I I I I I M I I I I I I I I II I I I M I M I I I II II I I I I I II I I I I 

^ orf 142ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

E orf 142-1 . pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 

r 30 I I I I II I I i I I I I i M I I I 1 I I ! I I I I I II I I I I I I I I I I I I 1 l I M I I I I i I : ! I I I I 

orfl42ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orf 142-1 . pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
: 2 I II I I I I I I I I I M ! I I I I M I I I i : I I :: I I :: I I I I I I : I 

^ 35 orfl42ng-l IRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 



40 



45 



55 



60 



gi | 1772622 (L39897) HecB [Erwinia chrysanthemi] Length = 558 
Score = 119 bits (295), Expect = 3e-26 

Identities = 88/346 (25%), Positives - 151/346 (43%), Gaps = 22/346 (6%) 

DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 
DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
DNSGQKSTGEEQLNGSLALDNVFGLADQWFISAGHS SRFATSHDAESLQAG 280 



+S P+G W +N++ RY + G S F +R+++RD KT ++ 



Query: 


2 


Sbjct: 


230 


Query: 


62 


Sbjct: 


281 


Query: 


122 


Sbjct: 


340 


Query: 


182 


Sbjct: 


400 


Query: 


242 


Sbjct: 


457 



50 R +Y++ 4- L RK + ++H + A F Y G 



+++ E + WT SA P Y S-f+ Q++ L ++L +GG ++ 



RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 
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Query 297 TAIGIRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 

A+G+ + L +G+P+Q V G++VG SF 
Sbjct: 516 GAVGMTVASRW LSQQVTVGWPISYPAWLQFDTMVVGYRVGLSF 558 

On the basis of this analysis, it is predicted that the proteins from N. meningitidis and 
K gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 73 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCTXACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. . 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 



1 MRTKWSAVRS CTftfADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 
51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN . . 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1>: 



1 AT GG AATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

4 01 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

4 51 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 



1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWADTA DIDTALNLLY RLQKLE FLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A of K 
meningitidis; 



10 20 30 

orf 1 4 3 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFL 

I : : Ml t t t I I I I I I I I I M I I I I M 
o r f 1 4 3 a GAFYAVS S DXPS AGKTLLHSLLKADADEMVS SEKLLTWAXTADI DTALNLLYRLQKLE FL 

20 30 40 50 60 70 
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40 50 60 70 80 90 

or f 1 4 3 Pep YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 

MIMMIIMII I I U I I I I M I M H I I M ! I M I 1 I I I M I II I M M U M M M 
orfl4 3a YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 

80 90 100 110 120 130 

100 110 
orf 14 3 . pep VAQMEKKYRLL I KNN 
I I I I I I I I M MM 

orfl4 3a VAOMF.KKYRLXIKNNLYINNNAWGVCDPSGQSELT FFPLYIGSTKFILVIGG IPDLGKEA 
140 150 160 170 180 190 

The complete length ORF143a nucleotide sequence <SEQ ID 61 3> is: 

1 ATGGAATCAA CANT TTC ACT ACAAGCAAAT TTATATCNCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGNCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCGGATG CGGACGAAAT GGTNAGCAGT 

151 GAGAAGCTGC TTACCTGGGC GGANACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

4 01 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCNNATT 

4 51 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATNTTATA CCNCCNGTTA CAGCAACCGC GTGTAAAACT 

651 TGGGAGAGAG GANGGGTTAT GCAGCAATTA TTGA 

This encodes a protein having amino acid sequence <SEQ ID 614>: 

1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFFLYIG 5TKFILVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRE XGLCSNY* 

ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 



orf 14 3a. pep MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTA 
MM I M I I M I M M I I I M I I M I M II M M I M II I I I I II M M II II I M 

orf 14 3-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 



orf 143a . pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 
M M I M II I M M M I II M M M M M I I II 1 M M I i M M I M I M M I II M M I 
orf 14 3-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 14 3a . pep NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
M I M M M M II I M II M I I I M II I M M I II I M II M M M M II M I M M M 
orf!4 3-l NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 



O r f 1 4 3 a . pep STKFILVIGGI PDLGKE AFVTLVRXLY 
I M M I II II I I M I M II M I M M 
orf 14 3-1 STKFILVIGGI PDLGKEAFVTLVRILY 



Homology with a predicted ORF from N. gonorrhoeae 

ORF143 shows 95.5% identity over a HOaa overlap with a predicted ORF (ORF143ng) from 
N. gonorrhoeae: 



orf 14 3. pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLXDEQLPLLMEQL 60 

I I I I I I I I I I I : II I II M M I I II I I I M II I I II I II I I I II I I I M M M M I II 

orfl4 3ng MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

orf 143 . pep SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 110 

M M M I I M II I M M M II I I I : M I I M M I I II I M II I I M I : II 

orfl4 3ng SGSGECALLVDRNGLYLANANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 
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An ORF143ng nucleotide sequence <SEQ ID 61 5> was predicted to encode a protein having amino 
acid sequence <SEQ ID 616>: 

1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT F FPLYIGSTK FILVIAGI PD 

151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

201 SAVISTDGLP MATMLPSHLN SDRVGAISAT LLALGSRSVQ ELACGELEQV 

251 MIKGKSGYIL LSQAGKDAVL VLVAKETG RL GLILLDAKRA ARHIA EAI* 

Further work revealed the following gonococcal DNA sequence <SEQ ED 617>: 



1 ATGGAATCAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

351 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

4 01 TGGCGGCAGA AGTCGCACAG ATGGAAAAGA AATACCGGCT GCTGATTAGG 

4 51 AACAACCTGT ATATCAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>: 



1 NESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDPSGQSE LTF FPLYIG5 TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* 

ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 



orf 143ng-l . pep MESTLSLQANLYPCLTPAGAFYAVSSDAPSAGKTLLRSLLKADADEWSSEKLLA-ADTA 59 

I t M I I I I I I I M M I I I I ( I I I I M I I I I i I I I I : M I I I M II : M I! 11 I : I I I I 
orf 14 3-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orf 143ng-l -pep D I DT ALN LLYRLQKLE FL YGDENG H S DG IN L S DEQL P LLME QL S G S GKALLV DRN G LY LA 119 

M I I I I I I I I I I I I II I II I I I I I I I M I I I II I I I I I M I I I i I I I I I I I I II I I I I I I 
orf 14 3-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 120 

orf 143ng-l .pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 179 

M I I I I I : I I I II I I I I I I I I I I I I I I I I I : I I I I I I II I M I I I I I I M I I I I I I M M 
orf 143-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 180 

orf 143ng-l .pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 

I I I I I I I I : I I I II : I I I I I I I II I I I I I I I II I 
orfl4 3-l STKFILVIGGIPDLGKEAFVTLVRILYRRYSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis md N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 619>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 
51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 
101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 
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151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GaCGGGTCAA wTyCCAGCGT 

401 CCGTGGATG . . 

This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 62 1>: 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

4 01 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

4 51 CTGTCTTTGG GCGTGGGCAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

7 01 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC CGTGCCGTTT 

7 51 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGGCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACGGCAGTA G 

This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>; 



1 MTFLQRLQGL ADNKICAFA W FVVRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLVV TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

4 01 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 144 .pep MTFLQRLQGLADNKICAFA WFWRRFDEERVFQXAASMTFTT LLALVPVLTVMVAVASI F 
I I I II I I I I I I I I II it I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I M I I I I I I 
orf 14 4a MTFLQRLQGLADNKICAFA WFWRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVASI F 

10 20 30 40 50 60 
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orf 14 4 Deo puppDTjQnqwg^nTTVPy^AnMVFDYINAFREOANR LTAIGSVMLVVTSLML IRTID 
MIMIIIIIMIIIMIM IM!MMIHMII!IIII!MI!UIIM IMIMI 
orf!44a ^?gnpwRns w.qFWnTTVPOGADMVFDYINAFREOANR LTAIGSVMLWTSXML IRTID 

70 80 90 100 110 120 

130 

orf 144 .pep NT FNRIWRVXXQRPWM 
I ! M M I I I I I I I I 

orf 14 4a NT FNRIWRVNSORPWMMQFLVYWA LLT FGPLSLGVG I S FXV GS VQDAALASGAPQWSGAL 

130 140 150 160 170 180 

The complete length ORF144a nucleotide sequence <SEQ ID 623> is: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGNTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTNTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCNGA TGCTGATTCG 

351 G AC GAT AG AC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

4 01 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

4 51 CTGTCTTTGG GCGTGGGCAT TTCCTTTATN GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAN CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTNCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCANGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTCTGT CTGGAAACCG CGCGTTCCCT CTTTACTTGG TATATGGGCA 

7 01 ATTTCGACGG CTACCGCTCG ATTTACGGNG CGTTTGCCGC CGTGCCGTTT 

7 51 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGNCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CNAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGATGCCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACAGCAGCA ATCTTGA 

This encodes a protein having amino acid sequence <SEQ ID 624>: 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TTLLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSXMLI RTID NTFNRIWRVN SQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT LLLWGLYRXV 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRXFDSRGRF DDVLKILLLL 

301 DAAQKEGXAL PVQE FRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT LNMTLAEFDA 

401 QAKKQQQS * 

ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 

orf 14 4a. pep MTFLQRLQGLADNKICAFAWFVVRRFDEERVPQAAASMT FTTLLALVPVL TVMVAVASIF 
I 11 M I I M 1 II I I I II M I I I I M ! I 1 I I M I I I I M I i I I I I I I I I II M I I I I I II I 
orfl4 4-l MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orf 144a . pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSXMLIRTID 
I I I I I I I 1 I I I II I I I I I I I I I I I I M I I I ! I I I I II I I I I I I I I M I II M I I I I I I I 
orf 14 4-1 PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLVVTSLMLIRTID 

orf 14 4a . pep NTFNRIWRVNSQRPWMMQFLVYWALLT FGPLSLGVG IS FXVGS VQDAALASGAPQWSGAL 
I I I I I I 11 I I I II I I I I II I I ( II 11 I I I I 11 I I I I I I I I I I I M I II I I I I I I I I I I I 
orfl4 4-l NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 14 4a . pep RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 
I I M I I : M I I I M I II 1 I I I I 1 I I I I I 11 I I M I II II I I || I I I I I | | | | | | | I | | 
orf 1 4 4-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 
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nrfl44a neo IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 
P P MMMIMMMllimiMMIlllllMMMIIiMI Hi I I I I I M I i t I I I t 
orfl4 4-l iyGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orf 144a pep DAAQKEGXALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 
I I | 1 I I I 111 M I I I M M M I U I M I M I I I M III M I I i I I I i M I t M III I I t 
orf 144-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

orfl4 4a pep FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTLAEFDAQAKKQQQS 408 

M ! I I I M M M I M I 1 I M I I II I II I M 1 M I I I II M I M : I 
orf 14 4-1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 406 

Homology with a predicted ORF from Kgonorrhoeae 

ORF144 shows 91.2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
Kgonorrhoeae: 

orf 144 pep MTFLQRLQGLADNKICAFAWFWRRFDEERVPQXAASMTFTTLLALVPVLTVMVAVASIF 60 

M | I I II I I! M I I II II I : I II : I I I I I I I II I I M I I I I M M I I I II I I I M I 
orfl4 4ng MT FLQCWQG S ADNK I C AFAW FV I RRFSEERVPQAAASMT FTTLLALV PVLTVMV AVAS I F 60 

orf 144 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 120 

M I I I M I I II I I I I I I M I I I I M I I I I : I I I : M M I I I I I I I I I I I I I M I I II I I 
orfl4 4ng P VFDRWS D S FVS FVNQT I VPQGADMV FD Y I DAFRDQANRLT AI G S VMLVVT S LML I RT I D 120 

orf 144. pep NTFNRIWRVXXQRPWM 136 
1:1111111 '-11111 

orfl4 4ng NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 180 

The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 



1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWA LLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FV GAL IT AFC LETARFLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

4 01 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 627>: 



1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

7 01 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

751 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 
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1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 
1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 
1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence <SEQ ID 628; ORF144ng- 
5 1>: 

1 MTFLQRWQGX ADNKICAFA W FVIRRFSEER VPQAAASMTF TT LLALVPVL 
51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 
101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWALLTFGP 
151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 
10 201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 
301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
4 01 QAKKQQQS * 

15 ORF144ng-l and ORF144-1 show 94.1% identity in 406 aa overlap: 

orfl4 4ng-l.pep MT FLQRWQG LADNK I C AFAW FV I RR FS E ERV PQAAASMT FTT L L ALVP VLT VMVAVAS I F 
MINI I M I I M I I I I ) I I I : I I I : I I I M I II I I t t II I I I I i 1 I I I I I II It M II 
orfl4 4~l MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

20 orf 14 4ng-l .pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 

M II I M I I! I M M I M I I I M I I II I II : ! I I : I I I I I 1 I II II II I M I 1 I II I I M 
orf 14 4-1 PVFDRWS DS FVS FVNQT I VPQGADMVFD YINAFREQANRLTAI GS VMLVVTS LMLIRT I D 

orf 14 4ng-l .pep NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVG I SFMVGSVQDSVLS SGAQQWADAL 
25 I : I I M ) M 1 : I I I I I I I I I I I I I I II 1 M II I 1 1 I I I I M I I I II :: I : II I II: II 

orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 14 4ng-l .pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 
: I 1 I ! : M I M II I II M I I I II I 1 I I I 1 I I I I I I I I II M I I 11 I I M I I I M I I I 
30 orf 14 4-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 



35 



orf 14 4ng-l . pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 
M || I I I ! II M II I II ! I I II II 1 I II I I I I I I I I I I I I I I I II I I M I t I I I I I I I I I 
orf 14 4-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orf 14 4ng-l.pep DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 
NIMH::! II II II I ! I I I I I I I I I I I I I I I I : I I I I I I I I I I I II I II I I I I I : I I 
orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 



40 orf 144ng-l .pep FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKQQQS 

M I I I I I I I 1 I I M I I I I I I I II I I I I I I I I I I I I I I M I I I M : I 
orf 14 4-1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
45 N. gonorrhoeae ; and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 75 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 629>: 

1 . . AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

50 51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 
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This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 

1 , .RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 
51 TRRKWLDAHE RQHLRQSLLE TREHG* 

Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

4 01 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

4 51 CTCATGCGCG -CCATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 

7 01 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 

1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFVV LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LFL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A of N. 
meningitidis: 

10 20 30 

orf 14 6 . pep R H ARR I R I DT A I N PE LE ALAE HLH YQW QG F 

I I I I I I I I 1 I I I I I I I I I I I I 1 I 1 j 1 I II 1 
orf 14 6a KLNGSEIRLLDRHFTLLQTDLQQTVALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 
280 290 300 310 320 330 

40 50 60 70 

orf 1 4 6 . pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

! I I I I : I I I I I 1 I I I I I I i I I i I I I t I II I I I i I ! I I f I I I I I i : 
orf 14 6a LWLSTNMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 
340 350 360 370 

The complete length ORF 146a nucleotide sequence <SEQ ID 633> is: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 
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201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCGGGGCTGA 

4 01 CGATGTGCAT GCTCATCGGC GACAACGGCA GCGAATGGTT CGACAGCGGC 

4 51 CTGATGCGCG CGATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GACCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAAGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

7 51 CGTAAAATTG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACAGTTGA 

This encodes a protein having amino acid sequence <SEQ ID 634>: 

1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFVV LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWFDSG 

151 LMRAMN VLIG AAIAIAAAKL LFL KSTLMWR FMLADNLTDC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHS* 

ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 

orf 14 6a . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFVV 
I I II M I I I M I I I I I M I I I I I I I I M 1 M I I I I I II I t I I I I M I 1 II I M I I I I I 1 I 
orfl4 6-l MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

orf 14 6a. pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
M I ! I I M I I I I I I I M II I I M I M II I II i I I I I! II I I ! M I I ! M I 1 ! M I ! I I I I 
orf 14 6-1 LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

orf 14 6a. pep VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
I I I M M I I I I I I II It M I M I I I I : I t I I I I I I I I I M M M I I M I I I ! I II I M M 
orf 14 6-1 VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

orf 14 6a. pep FMLADNLTDCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I II I I I I : I I I I I I M I II M I I I I I I M 1 II I I I M I M I M I II I II j M I M I M I I 
orf 14 6-1 FML ADN L A D C S PCM I AE I S NG R RMT RERL E ENMAKMRQ I N ARM VKS R S H L AAT SGESRISP 

orf 14 6a . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
I I M I I II I I I I I M I I I I I M I M II I I I I I I I II I I M I I I II I I II I M I II I M II 
orf 14 6-1 AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQT DLQQTVALING 

orf 14 6a . pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
M I I M II I I I I I I M I II I I I II I I I I I I I I M t II II II I I I II I I M I M I II I II I 
orf 14 6-1 RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 14 6a. pep RQHLRQSLLETREHSX 

I I I I I I I I I I I I M : 
orf 14 6-1 RQHLRQSLLETREHGX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
N. gonorrhoeae: 
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rfUfi ™ RHARRIRIDTAINPELEALAEHLHYQWQGF 30 

orfl4b.pep m || | | II | Mill II I II I III II I II I 

KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 364 



orf 14 6ng 
orf 14 6 .pep 



LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 75 



,,111:1111111111 I III I 111 I I I I M I Ml Ml M 1 III 
orfl46ng LWLSTNMRQEISALVIPLQRTRRKWLDAHERQHLRQSLLETREHG 409 

An ORF146ng nucleotide sequence <SEQ ID 635> was predicted to encode a protein having amino 
acid sequence <SEQ ID 636>: 

1 MSGVRFPSPA PIPSTDPPSG SLCFFTFPLQ ThSDMNSSQR KRLSGRWLNS 

51 YERYRHRRLI HAVRLGGTVL FATALARLLH LQHGEW IGMT VFWLGMLQF 

101 QGAIYSNAVE R MLGTVIGLG AGLGVLWL NQ HYFHGNLLFY LTIGTASALA 

151 GWAAVGKNGY VPMLAGLTMC MLIGDNGSEW LDSGLMRAMN VLIGAAIAIA 

201 AAKLLPL KST LMWRFMLADN LADCSKMIAE ISNGRRMTRE RLEQNMVKMR 

251 QINARMVKSR SHLAATSGES RISPSMMEAM QHAHRKIVNT TELLLTTAAK 

301 LQSPKLNGSE IRLLDRHFTL LQTDLQQTAA LINGRHARRI RIDTAINPEL 

351 EALAEHLHYQ WQGFLWLSTN MRQEISALVI PLQRTRRKWL DAHERQHLRQ 

401 SLLETREHG* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 637>: 

1 ATGAACTCCT CGCAACGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaccGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 
151 . gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggccgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt ccctatgctg GCGGGGctgA 

401 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

4 51 CTGATGCGCG CGATGAACGT CCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AG C AAAAT G A TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC TC CAT GAT GG AAGCCATGCA GCACGCCCAC 

7 51 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 



1 MNSSQRKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSNAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 



orfl46-l. pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
I I : M I : I I : M 1 i I I I I I I : I I M I M I I I I : III I I I I I I I I I II I I I M I III I I 
orfl4 6ng-l MNSSQRKRLSGRWLNS YERYRHRRLI HAVRLGGTVL FATALARLLHLQHGEW I GMT VFVV 

orf 14 6-1 . pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I II I I I I I I I M : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I II II M II 
orf 14 6ng-l LGMLQFQGAI Y SN AVERMLGT V I GLGAGLGVLWLNQH Y FHGN LL FYLT I GT AS ALAGWAA 
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orf 14 6-1 pep VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
I | | | | | | N I I I I I I I I I I I I M M I I I ! I I! M I I M I M I I I II I I M I I 1 M 1 I I ! I 
orfl4 6ng-l VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

5 orfl46-l pep EiyiLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

| I || | | M II II I M I I II M I I I I 1 I I I : II : I I I I I I I II I 1 I I I I I I I I M I I M I I 
orfl4 6ng-l FMLADNLADCSKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 

orf 14 6-1. pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
10 : I I I I M i I I M I II I I M M I I I I I I I I I I I I II M I I I I I i I I I M I I I I I I : II I M 

orfl4 6ng-l SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

orf 14 6-1 .pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
t I I I M I II M M II I I I M II I I f I II M I I M If I I II I M I I I II M I I I 11 I M I I 
15 orfl4 6ng-l RHARR XRIDTAINPE LE AL AE H L H Y QW QG F LWL STNMRQE I S AL V I LLQRTRRKW L D AHE 

orf 146-1. pep RQHLRQ S LLE T RE HGX 
I M I I I M I I I I II M 
O r f 1 4 6ng - 1 RQHLRQ SLLETRE HGX 

20 Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 

sp|P33011|YEEA_ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
>gi|1736674|gnl|PID|dl016553 (D90838) ORF_ID: o348#20; similar to [SwissProt 
Accession Number P33011] [Escherichia coli] >gi 1 1736682 | gnl | PID | dl016560 (D90839) 
ORF__ID: o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli] 
25 >gi | 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 

has 203 additional C-terminal residues [Escherichia coli] Length ~ 352 
Score = 109 bits (271), Expect = 2e-23 

Identities - 89/347 (25%), Positives = 150/347 (42%), Gaps « 21/347 (6%) 



30 



G T V I G LG AG LG V L W LN QH Y FHGN L L F Y L T I G T AS ALAG W AA VG KNG Y V PMLAG L TMCML I 139 
35 " GTV+G GL L L L 4- A L GW A+GK Y +L G+T+ +++ 

GTVLGS ILGLIALQLE LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAIVV 131 



40 



45 



Query: 


20 


Sbjct: 


15 


Query: 


80 


Sbjct: 


75 


Query: 


140 


Sbjct: 


132 


Query: 


200 


Sbjct: 


191 


Query: 


260 


Sbjct: 


248 


Query: 


317 


Sbjct: 


306 



YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
YRH YR I VHGTR VALAFLLT FL 1 1 RL FT I PE S T W PL VTMW I MG P I S FWGN W PRAFE RIG 74 



E +D+ L R+ +V++G + P ++ + WR LA +L + +++ 



+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 



LN ++R D AL G +N 



50 Query: 317 EALAEHL — HYQWQ GFLWL STNMRQE I SALVI LLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and K gonorrhoeae, and 
55 their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 76 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 639> 



1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 
51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 
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AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 
GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 
GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC . GCGGTGA 
TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 
GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 
GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 
GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 
CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 
TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 
AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 
TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 
CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 
CTTTGTACGA T. . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 

1 ..AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 

51 AGT PAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 641>: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

7 01 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

7 51 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 
801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

8 51 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 642; ORF147-l>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI IC AEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical protein ORF286 of E.coli (accession number U18997) 
ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: 

Orf 14 7 : 1 AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG 60 

AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
Orf 286: 43 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 

Orf 147 : 61 AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 

L R RE F + GF+P KS RR 

Orf 286: 103 YHLVRTCREAGIRVVPLPGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAE 162 

Orf 147: 121 AFPIVMFETPHRIGAALADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALSADGD 179 

++ +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ + D + 

Orf 286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 222 
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Orf 147" 180 QSRGEMVLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K-t-AA LAA-t-I G K ALY 

Orf286: 223 RRKGEMVLIV-EGHKAQEEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALY 278 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A of N. meningitidis: 

10 20 30 

orfl47 pep AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 

M I I I I I I I I I I I I I I I I I I M I I I I I I I I 
orf 75a TLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
20 30 40 50 60 70 

40 50 60 70 80 90 

15 orfl47 pep MADKTVGYLS DGMWAQVS DAGT PAVCDPGAKLARRVREAGFK WP WGAXAVMAALS VA 

I I M I M I M I I ! 11 M I I I II ! I II I II I I I 1 ! I I M I : I I I if I I I I I I M I i I I M 
o r f 7 5 a MADKIVGYL S DGMWAQVS DAGT PAVCD PGAKLARRVRE VG FKVV P WGAS AVMAAL S VA 

80 90 100 110 120 130 

20 100 110 120 130 140 150 

orf 147 . pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 
|| | | | || M I I I I I I II ! I I I I I I I II I : I I I : I I II I I ! I M I : I I I I M ! I I 1 I I i I 
orf 7 5a GVAGSDFYFNGFVPPKSGERRKLFAPCWVRVAFPWMFETPHRIGATLADMAELFPERRLM 
140 150 160 170 180 190 

25 

160 170 180 190 200 210 

orf 147 . pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
I I I I (I I I I I I I I I I I I I I I I I I I : I I I : I I I I I II I I I I I I I I I I I I I I I II I I I M I I 
orf 7 5a LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
30 200 210 220 230 240 250 

220 230 
orf 147 .pep LTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I I I I I I I 11 I I I I I M I I I I 
35 or f 7 5a LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

260 270 280 290 

ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF from N. gonorrhoeae 

ORF147 shows 94.1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from N. 
40 gonorrhoeae: 

orf 147 . pep AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 30 

II I I I I 1 M I I I I I t I I I : 1 M I II I I I I I 
orfl47ng TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHNERQ 85 

45 orf 147 .pep MADKI VG YLS DGMWAQVS DAGT PAVC D PGAKLARRVRE AG FKWP WGAXAVMAALS VA 90 

I If I : : I : I I f I : II I II II I I I 1 I I 1 I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I 
orfl47ng MADKV IGFLS DGLVVAQVS DAGT PAVC DPGAKLARRVREAG FKWP WGASAVMAALSVA 14 5 

orf 147 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 
50 II I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I : 1 I I I I I I I 1 I I : I I I I I I I I I I I I I ) 

orfl4 7ng GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

orf 147 . pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 

II 1 I I M I I I I I I I I I I I I I I I I I : II I : I I I I I I I I I I II I I I I I I M ! M II I I I I ! 

55 orf 14 7ng LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 265 

orf 14 7 . pep LTAELPTKQAAELAAKITGEGKKALYD 237 

I : I I I I I I I II I I I 1 I I I M I I I I M I 
orfl4 7ng LAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 
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An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 
acid sequence <SEQ ID 644>: 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

5 101 AQVSDAGTPA VCDPGAKLAR RVREAGF KW PWGA5AVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPW MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

10 Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

15 201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

4 01 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

20 4 51 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

25 701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 

This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

30 1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAF P V VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

35 251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

Sp I P4 5528 | YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi | 606086 (U18997) ORF_f286 [Escherichia coli] 
40 >gi | 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 

[Escherichia coli] Length — 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

45 Query: 4 KHLQKAS DS WGGT LY WAT P I GNLAD I TLRALAVLQKADI I CAE DT RVTAQLL S AYG I Q 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

» j. GRLV S VREHNERQMADKVI GFL S DGL WAQVS DAGT PAVCD PGAKLARRVRE AG FKW PV 123 

50 RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +VVP+ 



55 



60 



Query: 


4 


Sbjct : 


2 


Query: 


64 


Sbjct: 


60 


Query: 


124 


Sbjct: 


120 


Query: 


184 


Sbjct: 


180 


Query: 


243 



G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

?GPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 

^DMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 
D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 

Query: 243 HEGLSE SAQNAMKI LAAELPTKQAAELAAKI TGEGKKALYDLAL 286 
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EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from Kmeningitidis and K gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 77 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 647> 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 

2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 

3551 
3601 
3651 
3701 
3751 
3801 



ATGAAAACAA 
AACCGGTCGC 
TCGGCATTCT 
TACCAATACT 
GGCGAAAGAT 
CAATGACAAA 
GTGGCGGcAT 
GCGGCTATAA 
CAACAwCGww 
GACTAAAGGC 
AATwTGTCAC 
CGGAAATATA 
AGGCAGGCAA 
GTTCATATCA 



AAAGTGGTTA 
AAAGCAATGG 
TTTGCTGGAG 
ATACTCTTTT 
ATGAACACAA 
TTTAATGTTT 
AGGTGGTGTC 
CCTTTATTGA 
CAAGGTGCTG 
AAATAACGAA 
CCGTTACTTG 
GGCAAAGGCA 



CCGACAAACG 
ATCCGCTTCT 
TCCCCAAGCC 
ATCGCGACTT 
ATTGAGGTTT 
AGCCCCGATG 
TGGTGGGCGt 
CAACGTTGAT 
TTACTTATAA 
CATCCTTATG 
AGATGCAGAA 
TCGATCAAAA 
TATTGGCGAT 
TATTGCAAGT 

GGCTC 

ATTAATGGGG 
CTTCCAGCTG 
ATACCCATTC 
AACGACGATA 
TTCTCTGCCT 
CTTTATCCGA 
AACAGTTATC 
CGAAGGAAAA 
GAGGATTATA 
ACTTGGCAAG 
GAAAGTAAAC 
CGCTG 



GACAACCGAA 
C . GCTGCTTA 
TGGGCGGGAC 
TGCCGAAAAT 
ACAACAAAAA 
ATTGATTTTT 
ATCAATATAT 
TTTGGTGCGG 
AATTGTGAAA 
GCGGCGATTA 
CCTGTTGAAA 
TAATTACCCT 
CTGATGAAGA 



ACACACCGCA 
CTTAGCCATA 
ACACTTATTT 
AAAGGCAAGT 
AGGGGAGTTG 
CTGTGGTGTC 
TGTGAGCGTG 
AAGGAAk . AA 
CGGAATAATT 
TCATATGCCG 
TGACCAGTTA 
GACCGTGTTC 
TGAGCCCAAT 



AAGCCCCGAA 
TGCCTGTCGT 
CGGCATCAAC 
TTGCAGTCGG 
GTCGGCAAAT 
GCGTAACGGC 
GCACATAACG 
tATCCC . GAT 
ATAAAGCAGG 
CGTTTGCATA 
TATGGATGGG 
GTATTGGGGC 
AACCGCGAAA 



ACCAATGTTT 
TATTGCAAAC 
GTTCGTAAAG 
AGTATTCTAC 
ATAATGGCAC 
AATAGATTAA 
GACAGCAAGA 
GACCCAGACT 
GGCGAATTGA 
TTTCCAAGGA 
GCGCGGGCGT 
GGCGTGGCAA 



ATCTATGATG 
GGGCAACCCC 
ATTGGTTCTA 
GAACCACGTC 
AGGAAAAATC 
AAACACGAAC 
GAACCTGTTT 
GAATAATGGA 
TACTTACCAG 
GATTTTACGG 
TCATATCAGT 
ACGACCGCCT 



CCCAAAAGCA 
TATATAGGAA 
TGATGAAATC 
AAAATGGGAA 
AATGCCAAAC 
CGTTCAATTG 
ATCATGCTGC 
GAAAATATTT 
CAACATCAAT 
TCTCGCCTGA 
GAAGACAGTA 
GTCCAAAATC 



// 



TGACTGCTTC 
GATCACGCTC 
TAGTGCAAAT 
ACGGCAACCk 
ACATTAAACG 
CGACCACGCC 
CAAACGTAAG 
GCAGTATTCC 
CAagGATACG 
GarCGGAATT 
TCCGCCTATC 
TGCGCCGCGC 
CACCGCCAAC 
AAATTGAACG 
CCGCAGCGAC 
TGGCGGTCAA 
GTAGTGGAAG 
CCTGCAAAAC 



ATTGACTAAG 
ATTTAAATCT 
GGCGATACAC 
TAgCCtCGtG 
GCAACACATC 
GTACAAAACG 
CCATTCCGCA 
ATTTTGAAAG 
GCATTACACT 
AGGCAATTTA 
GCCACGATGC 
CGCCGTTCGC 
TTCGGTAGAA 
GTCAGGGAAC 
AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
GAACACGTCG 



CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 



GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 



ACCGACATCA 
CACAGGGCTT 
GTTATACAGT 
G.sAATGcCC 
GGCTTCgGGC 
GCAGTCTGAC 
CTCAACGGTA 
CAGCCGCTTT 
TAAAAGACAG 
AACCTTGACA 
GGCAGGGGCG 
GCCGTTCGCG 
TCCCGTTTCA 
ATTCCGCTTT 
TGGCGGAAAG 
AACGAACCTG 
CAAACCGCTG 
ATGCAGGCGC 
// 

, . . . TTAGAC 
GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 



GCGGCAATGT 
GCCACACTCA 
CAGCCACAAC 
AAGCAACATT 
AATGCTT.CAT 
GCTTTCCGGC 
ATGTCTCCCT 
ACCGGACAAA 
CGAATGGACG 
ACGCCACCAT 
CAAACCGGCA 
CCGTTCCCTA 
ACACGCTGAC 
ATGTCGGAAC 
TTCCGAAGGC 
CAAGCCTCGA 
TCCGAAAACC 
GTGG 



. . . GATAAAG 
CGATCTTGCC 
ACGGCAATCT 
GCCACCCAAA 
TAATCAAGCC 
TTAATCTAAG 
AACGCTAAGG 
AGCCGATAAG 
TCAGCGGCGG 
CTGCCGTCAg 
TACaCTCAAT 
GTGCGACAGA 
TTATmCGTTA 
GGTAAACGGC 
TCTTCGGCTA 
ACTTACACCT 
ACAATTGACG 
TTAATTTCAC 



CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 



CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
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3851 ATCAGnCGCG GGCGCGGGTT TTAGCAGCGG CAGCCTTTcA GACGGCATCG 

3901 GAGsmAAAwT CCGCCGCCGC GTGCtGCATT ACGGCATTCA GGCACGAtAC 

3951 CGCGCCGgtt tCggCGgATt CGGCATCGAA CCGCACATCG GCGCAACGCg 

4001 ctATTTCGTC CAAAAAGCGG ATTACCGCTA CGAAAACGTC AATATCGCCA 

4051 CCCCCGGCCT TGCATTCAAC CGcTACCGCG CGGGCATTAa GGC AG AT TAT 

4101 TCATTCAAAC CGGCGCAACA CATTTCCATC ACGCCTTATT TGAGCCTGTC 

4151 CTATACCGAT GCCGCTTCGG GCAAAGTCCG AACACGCGTC AATACCGCCG 

4 201 TATTGGCTCA GGATTTCGGC AAAACCCGCA GTGCGGAATG GGgCGTAAAC 

4 251 GCCGAAATCA AAGGTTTCAC GCTGTCCCTC CACGCTGCCG CCGCCAAAGG 

4 301 CCCGCAACTG GAAGCGCAAC ACAGCGCGGG CATCAAATTA GGCTACCGCT 

4 351 GGTAA. . . 

This corresponds to the amino acid sequence <SEQ ID 648; ORFl>: 

1 MKTTDKRTTE THRKAPKTGR IRFXAAYLAI CLSFGILPQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGVQYI VSVAHNGGYN NVDFGAEGXN IXDQXRXTYK IVKRNNYKAG 

151 TKGHPYGGDY HMPRLHKXVT DAEPVEMTSY MDGRKYIDQN NYPDRVRIGA 

201 GRQYWRSDED EPNNRESSYH IAS GS PMFIYDAQKQ 

251 KWLINGVLQT GNPYIGKSNG FQLVRKDWFY DEIFAGDTHS VFYEPRQNGK 

301 YSFNDDNNGT GKINAKHEHN SLPNRLKTRT VQLFNVSLSE TAREPVYHAA 

351 GGVNSYRPRL NNGENISFID EGKGE LILTS NINQGAGGLY FQGDFTVSPE 

401 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTL. 

// 

701 DKVTAS LTKTDISGNV DLADHAHLNL TGLATLNGNL 

751 SANGDTRYTV SHNATQNGNX SLVXNAQATF NQATLNGNTS ASGNASFNLS 

801 DHAVQNGSLT LSGNAKANVS HSALNGNVSL ADKAVFHFES SRFTGQISGG 

851 KDTALHLKDS EWTLPSGXEL GNLNLDNATI TLNSAYRHDA AGAQTGSATD 

901 APRRRSRRSR RSLLXVTPPT SVESRFNTLT VNGKLNGQGT FRFMSELFGY 

951 RSDKLKLAES SEGTYTLAVN NTGNEPASLE QLTWEGKDN KPLSENLNFT 

1001 LQNEHVDAGA W 

// 

1151 LDRVFAEDR 

1201 RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 

1251 RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 

1301 XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD YRYENVNIAT 

1351 PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG KVRTRVNTAV 

1401 LAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 

1451 * 

Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 

4 01 ATCGTTTTAC TTATAAAATT GTGAAACGGA ATAATTATAA AGCAGGGACT 

451 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 

551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 

601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 

101 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 

7 51 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACGTCAAAA TGGGAAATAC TCTTTTAACG 

1001 ACGATAATAA TGGCACAGGA AAAATCAATG CCAAACATGA ACACAATTCT 

1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACGAA 

1201 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 

1251 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 
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1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



GCACGTTCAA 
GTACAGTCAT 
TTTAGTGAAA 
CGATAATCAG 
GTTTGGATTT 
GATGAAGGGG 
TACCATTACA 
TGGATAGCAA 
ACGACCAAAA 
AGACCGCACC 
CGCAAACAAA 
TACAATCATT 
GGAAATCGTG 
ACTTCCAAAT 
GTGAAAGGCG 
CGCACCGCAT 
TGACAAATTG 
TTGACTAAGA 
TTTAAATCTC 
GCGATACACG 
AGCCTCGTGG 
CAACACATCG 
TACAAAACGG 
CATTCCGCAC 
TTTTGAAAGC 
CATTACACTT 
GGCAATTTAA 
CCACGATGCG 
GCCGTTCGCG 
TCGGTAGAAT 
TCAGGGAACA 
AATTGAAGCT 
AATACCGGCA 
AAAAGACAAC 
AACACGTCGA 
GAGTTCCGCC 
CGGCAAGGCA 
TTGACGCGCT 
GTTGCCGAAC 
GGCGGAGGAA 
CGAAACAGCG 
GCCCGCCGCG 
CCAACCGCAG 
AATTTTCCGC 
CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 
AGCCTTTCAG 
CGGCATTCAG 
CGCACATCGG 
GAAAACGTCA 
GGGCATTAAG 
CGCCTTATTT 
ACACGCGTCA 
TGCGGAATGG 
ACGCTGCCGC 
ATCAAATTAG 



GCCAAAGGGG 
TTTGGATCAG 
TCGGCTTGGT 
TTCAACCCCG 
AAACGGGCAT 
CGATGATTGT 
GGCAATAAAG 
AAAAGAAATT 
CGAACGGGCG 
CTGCTGCTTT 
CGGCAAACTG 
TAAACGACCA 
TGGGACAACG 
TAAAGGCGGA 
ATTGGCATTT 
CAAAGCCACA 
TGTCGAAAAA 
CCGACATCAG 
ACAGGGCTTG 
TTATACAGTC 
GCAATGCCCA 
GCTTCGGGCA 
CAGTCTGACG 
TCAACGGTAA 
AGCCGCTTTA 
AAAAGACAGC 
ACCTTGACAA 
GCAGGGGCGC 
CCGTTCGCGC 
CCCGTTTCAA 
TTCCGCTTTA 
GGCGGAAAGT 
ACGAACCTGC 
AAACCGCTGT 
TGCCGGCGCG 
TGCATAATCC 
GAAGCCAAAA 
GATTGCGGCC 
CGGCCCGGCA 
GAGAAAAAAC 
CGAAGCGGAA 
CCCGCCGGGA 
CGCGACCTGA 
CACGCTCAAC 
CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
ACGGCATCGG 
GCACGATACC 
CGCAACGCGC 
ATATCGCCAC 
GCAGATTATT 
GAGCCTGTCC 
ATACCGCCGT 
GGCGTAAACG 
CGCCAAAGGC 
GCTACCGCTG 



AAAACCAAGG 
CAGGCAGACG 
CAGCGGCAGG 
ACAAACTCTA 
TCGCTTTCGT 
CAACCACAAT 
ATATTGCTAC 
GCCTACAACG 
GCTCAACCTT 
CCGGCGGAAC 
TTTTTCAGCG 
TTGGTCGCAA 
ACTGGATCAA 
CAGGCGGTGG 
GAGCAATCAC 
CAATCTGTAC 
ACCATTACCG 
CGGCAATGTC 
CCACACTCAA 
AGCCACAACG 
AGCAACATTT 
ATGCTTCATT 
CTTTCCGGCA 
TGTCTCCCTA 
CCGGACAAAT 
GAATGGACGC 
CGCCACCATT 
AAACCGGCAG 
CGTTCCCTAT 
CACGCTGACG 
TGTCGGAACT 
TCCGAAGGCA 
AAGCCTCGAA 
CCGAAAACCT 
TGGCGTTACC 
GGTCAAAGAA 
AACAGGCGGA 
GGGCGCGATG 
GGCAGGCGGG 
GGGTGCAGGC 
ACCCGGCCGG 
TTTGCCGCAA 
TCAGCCGTTA 
AGCGTTTTCG 
CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 
ATCAGCGCGG 
AGGCAAAATC 
GCGCCGGTTT 
TATTTCGTCC 
CCCCGGCCTT 
CATTCAAACC 
TATACCGATG 
ATTGGCTCAG 
CCGAAATCAA 
CCGCAACTGG 
GTAA 



CTCGATCAGC 
ATAAAGGCAA 
GGTACGGTGC 
TTTCGGCTTT 
TCCACCGTAT 
CAAGACAAAG 
AACCGGCAAT 
GTTGGTTTGG 
GTTTACCAGC 
AAATTTAAAC 
GCAGACCAAC 
AAAGAGGGCA 
CCGCACATTT 
TTTCCCGCAA 
GCCCAAGCAG 
ACGTTCGGAC 
ACGATAAAGT 
GATCTTGCCG 
CGGCAATCTT 
CCACCCAAAA 
AATCAAGCCA 
TAATCTAAGC 
ACGCTAAGGC 
GCCGATAAGG 
CAGCGGCGGC 
TGCCGTCAGG 
ACACTCAATT 
TGCGACAGAT 
TATCCGTTAC 
GTAAACGGCA 
CTTCGGCTAC 
CTTACACCTT 
CAATTGACGG 
TAATTTCACC 
AACTCATCCG 
CAAGAGCTTT 
AAAAGACAAC 
CCGTCGAAAA 
GAAAATGTCG 
GGATAAAGAC 
CTACCACCGC 
CTGCAACCCC 
TGCCAATAGC 
CCGTACAGGA 
GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 
GCGCGGGTTT 
CGCCGCCGCG 
CGGCGGATTC 
AAAAAGCGGA 
GCATTCAACC 
GGCGCAACAC 
CCGCTTCGGG 
GATTTCGGCA 
AGGTTTCACG 
AAGCGCAACA 



GTGGGCGACG 
AAAACAAGCC 
AACTGAATGC 
CGCGGCGGAC 
TCAAAATACC 
AATCCACCGT 
AACAACAGCT 
CGAGAAAGAT 
CCGCCGCAGA 
GGCAACATCA 
ACCGCACGCC 
TTCCTCGCGG 
AAAGCGGAAA 
TGTTGCCAAA 
TTTTTGGTGT 
TGGACGGGTC 
GATTGCTTCA 
ATCACGCTCA 
AGTGCAAATG 
CGGCAACCTT 
CATTAAACGG 
GACCACGCCG 
AAACGTAAGC 
CAGTATTCCA 
AAGGATACGG 
CACGGAATTA 
CCGCCTATCG 
GCGCCGCGCC 
ACCGCCAACT 
AATTGAACGG 
CGCAGCGACA 
GGCGGTCAAC 
TAGTGGAAGG 
CTGCAAAACG 
CAAAGACGGC 
CCGACAAACT 
GCGCAAAGCC 
GACAGAAAGC 
GCATTATGCA 
ACCGCCTTGG 
CTTCCCCCGC 
AACCGCAGCC 
GGTTTGAGTG 
CGAATTAGAC 
GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 
TAGCAGCGGC 
TGCTGCATTA 
GGCATCGAAC 
TTACCGCTAC 
GCTACCGCGC 
ATTTCCATCA 
CAAAGTCCGA 
AAACCCGCAG 
CTGTCCCTCC 
CAGCGCGGGC 



This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 



1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL FQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPYGFLPT GGSFGDSGSP MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPRQNGKY SFNDDNNGTG KINAKHEHNS 
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351 LPNRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDE 

4 01 GKGELILTSN INQGAGGLYF QGDFTVSPEN NETWQGAGVH ISEDSTVTWK 

451 VNGVANDRLS KIGKGTLHVQ AKGENQGSIS VGDGTVILDQ QADDKGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDIATTGN NNSLDSKKEI AYNGWFGEKD 

601 TTKTNGRLNL VYQPAAEDRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

' 651 YNHLNDHWSQ KEGIPRGEIV WDNDWINRTF KAENFQIKGG QAWSRNVAK 

701 VKGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTNCVEK TITDDKVIAS 

7 51 LTKTDISGNV DLADHAHLNL TGLATLNGNL SANGDTRYTV SHNATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASGNASFNLS DHAVQNGSLT LSGNAKANVS 

851 HSALNGNVSL ADKAVFHFES SRFTGQISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSATD APRRRSRRSR RSLLSVTPPT 

951 SVESRFNTLT VNGKLNGQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN 

1001 NTGNEPASLE QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG 

1051 EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAVEKTES 

1101 VAEPARQAGG ENVGIMQAEE EKKRVQADKD TALAKQREAE TRPATTAFPR 

1151 ARRARRDLPQ LQPQPQPQPQ RDLISRYANS GLSEFSATLN SVFAVQDELD 

1201 RVFAEDRRNA WTSGIRDTK HYRSQDFRAY RQQTDLRQIG MQKNLGSGRV 

1251 GILFSHNRTE NTFDDGIGNS ARLAHGAVFG QYGIDRFYIG ISAGAGFSSG 

1301 SLSDGIGGKI RRRVLHYGIQ ARYRAGFGGF GIEPHIGATR YFVQKADYRY 

1351 ENVNIATPGL AFNRYRAGIK ADYSFKPAQH ISITPYLSLS YTDAASGKVR 

1401 TRVNTAVLAQ DFGKTRSAEW GVNAEIKGFT LSLHAAAAKG PQLEAQHSAG 

14 51 IKLGYRW* 

Computer analysis of these sequences gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 shows 57.8% identity over a 1456aa overlap with an ORF (ORF la) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 1 . pep MKTTPKRTTETHRKAPKTG RIRFXAAYLAICL5FGIL PQAWAGHTYFGINYQYYRDFAEN 
I i | M I I II II i I I I II I M i i t I I I I I I I i I I I I I I I I t I II It II II I I II 1 I I I I 
orf la MKTTDKRTTETHRKAPKTGR IRFSPAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 1 . pep KGKFAVGAKDIEVYNKKGELVGKSMTJCAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 
I I I i I I I I i i I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I M I I I I I I I I I I I i I I 
orf la KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 1 . pep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

I I I I II I I M II I : I : I II I I I I I : : I I I : I I I I I I I I I I I I I I I I I I II 
orf la NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 
130 140 150 160 170 



190 200 210 

orf 1 . pep MDGRK Y I DQNN Y PDRVR I GAGRQYWRS DE DE P NN 

II I | : : : I I : I ill I : I : : I I I I : I : II 
orf la MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 
180 190 200 210 220 230 



220 230 240 250 260 

orfl.pep RESSYH IA SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

I : : : : II I I I I I I I I I : : I M : I I I I I I I I I I : I I I I I : I I 

orf la SGDVRIiANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 



270 280 290 300 310 320 

orfl.pep DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFbJV 

11111:1: 1(11:1 : 1 I I : I 1 : : I I : : : I I i I 1 : : : I : I I : f I : : 1 I : I 1 : 
orf la DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
300 310 320 330 340 350 



330 340 350 360 370 380 
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orfl t>ep SLSETARE PVYHAAGG VNS YRPRLNNGENI SFI DEGKGELI LT SN INQGAGGLY FQGDFT 

||:|| | I I II I: I I I I I 111 11:11 I I I : 1 : I I 1 : : I I M M I I I II : I III 

orfla SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 
360 370 380 390 400 410 

^ 390 400 410 420 430 

orfl pep VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL 

I 1 ! I I I I I M II ! 1 I I I I M II 1 1 I I 1 M I I I I I I 1 I I 1 I M 
orfla VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
10 420 430 440 450 460 470 



20 



30 



35 



orfl . pep 



15 orfla VILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 

480 490 500 510 520 530 



orfl .pep 



orfla RIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFGEKDTTK 
540 550 560 570 580 590 



25 orfl. pep 



orfla TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNBLGSGWSKMEG 
600 610 620 630 640 650 



orfl .pep 



orfla IPQGEIVWDWDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
660 670 680 690 700 710 



440 450 460 470 480 

orfl . pep XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

: 11:1111111111)111 I : I I : I ) I I I I I 

orfla TICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 
40 720 730 740 750 760 770 

490 500 510 520 530 540 

orfl . pep GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 
I II 1 I I 11 I I I 1 I 1 I 1 III 111111)1111111:1 I I I I I I I 1 1 : : I : I I I I 1 I 1 ) 
45 orfla GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNGSLTLSD 

780 790 800 810 820 830 

550 560 570 580 590 600 

orfl . pep NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 
50 I I I I I I I ) ) I I I I I 1 I I 1 I I 1 I I I I I : 1 II I I I : M : I I I 1 I I M I I I 1 1 I I 1 : I 11 f I 

orfla NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 
840 850 860 870 880 890 

610 620 630 640 650 660 

55 orfl . pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 

I 1 I 1 II I 1 I I I 1 I i I i 1 I I II 1 I : : i : I I I I I 1 I I II II I M II 1 I I I 11 I I I 11 

orfla NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFNTLTVNG 

900 910 920 930 940 950 

60 670 680 690 700 710 720 

orfl . pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTVVEGKDNKPL 
111 I I I I 11 I I I I I I 1 I M I 11 I 1 I 1 f II I 1 I I I 1 II 1 I 1 I I : I I : 1 I I 1 I II I I 1 II 1 
orfla KLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTVVEGKDNKPL 
960 970 980 990 1000 1010 

65 

730 740 750 

orfl . pep S EN LN FT LQN E H V DAG AW 

I I I I I I I II I I I I I I I I I 

orfla SENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAEKDNAQS 
70 1020 1030 1040 1050 1060 1070 
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orfl.pep 

o r f 1 a LDAL I AAGRDAAEKTE S VAE PARXAGGENVGIMQAEEEKKRVQADKDSALAKQREAETRP 

1080 1090 1100 1110 1120 1130 



760 

orfl.pep LDR 

orfla XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 
1140 1150 1160 1170 1180 1190 

770 780 790 800 810 820 

orf 1 . pep VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
I I | | | | | | | | | M I I t M M I I I 1 1 I I I i ( i M I I i I i < I 1 I ( I I I I t I I I I M I I I I 
orfla V FAE DRRN AVWT S X I RXTKH YR SQ D FRAYRQQT DLRQ I GMQKN LG S GRVG I L FS HNRTEN 

1200 1210 1220 1230 1240 1250 

830 840 850 860 870 880 

orfl pep TFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQA 
: | | 1 I I I II I I I I 1 I I! I II I ! I II II ! I : I I I I I M (Mill I I I I I I I I I I I I 
orfla XFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVLHYGIQA 
1260 1270 1280 1290 1300 1310 

890 900 910 920 930 940 

orfl . pep R YRAG FGG FG I E PH I GATR Y FVQKAD YR YEN VN I AT PGLA FNR YRAG I KAD Y S FK PAQH I 

I I I I I I I I) I II I : 1 I I I I I ! I I I I I I I I I I N 1 I I I I I I I 1 I I I I II I I I I M I 1 i I I 
orfla R YRAG FGG FG I E P Y I GATRYFVQKAD YRYENVN I AT PGLA FNR YRAG I KAD YS FK PAQHX 

1320 1330 1340 1350 1360 1370 

950 960 970 980 990 1000 

orfl . pep SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGP 
I I I I I I I I 1 I 1 I I I I II 1 I I I I I I I I I I I I I I I I I I I 1 M I I I I I I II I I II I I I I II 
orfla SITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHAAAAKGP 
1380 1390 1400 1410 1420 1430 



1010 1020 

orfl . pep QLEAQHSAGIKLGYRWX 
11 I I I I I I I I I I I I I 1 I 

orfla QLEAQHS AG I KLGYRWX 
1440 1450 

The complete length ORF la nucleotide sequence <SEQ ID 651 > is: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCT TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTNT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 

401 ACCGTTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA GCCTGACAAT 

451 TCACACCCTT ACAACGGCGA TTANCATATG CCGCGTTTGC ATAAATTTGT 

501 CACAGATGCA GAACCTGTCG AAATGACGAG TGACATGAGG GGGAATACCT 

551 ATTCCGATAA AGAAAAATAT CCCGAGCGTG TCCGCATCGG CTCAGGACAC 

601 CACTATTGGC GTTATGATGA TGACAAACAC GGCGATTTAT CCTACTCCGG 

651 CGCATGGTTA ATTGGCGGCA ATACACATAT GCAGGGTTGG GGAAATAATG 

701 GCGTANTTAG TTTGAGCGGC GATGTGCGCC ATGCCAACGA CTATGGCCCT 

7 51 ATGCCGATTG CAGGTGCGGC AGGCGACAGC GGTTCGCCAA TGTTTATTTA 

801 TGACAAAACA AACAATAAAT GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 

851 ACCCTTATTC CGGCAGGGAA AACGGTTTCC AGCTGATACG CAAAGATTGG 

901 TTCTACGATG ACATTTACAG AGGCGATACA CATACCGTCT NTTTTGAACC 

951 GCGCAGTAAC GGACATTTTT CCTTTACATC CAACAACAAC GGTACGGGTA 

1001 CGGTAACAGA AACCAACGAA AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 

1051 ACAGTCCGAC TGTTTGACGA ATCTTTGAAT GAAACTGATA AAGAACCAGT 

1101 TTACGCGGCA GGGGGTGTTA ATCAGTACCG TCCAAGGTTA AACAACGGTG 

1151 AAAACCTTTC TTTTATCGAT TACGGCAACG GCAAACTCAT CTTATCAAAC 

1201 AACATCAACC AAGGCGCGGG CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 



CHIR-0160 (356.001) 



-383- 



PATENT 



1251 CTCGCCTGAA AACAACGAAA CGTGGCAAGG CGCGGGCGTT CATATCAGTG 

1301 AAGACAGTAC CGTTACTTGG AAAGTAAACG GCGTGGCAAA CGACCGCCTG 

1351 TCCAAAATCG GCAAAGGCAC GCTGCACGTT CAAGCCAAAG GGGAAAACCA 

1401 AGGCTCGATC AGCGTGGGCG ACGGTACAGT CATTTTGGAT CAGCAGGCAG 

1451 ACGATAAAGG CAAAAAACAA GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 

1501 AGGGGTACGG TGCAACTGAA TGCCGATAAT CAGTTCAACC CCGACAAACT 

1551 CTATTTCGGC TTTCGCGGCG GACGTTTGGA TTTAAACGGG CATTCGCTTT 

1601 CGTTCCACCG TATTCAAAAT ACCGATGAAG GGGCGATGAT TGNCNATCAT 

1651 AATGCCACAA CAACATCCAC CGTTACCATT ACAGGGAATG AAAGTATTAC 

17 01 ACAACCGAGT GGTAAGAATA TCAATAGACT TAATTACAGC AAAGAAATTG 

1751 CCTACAACGG TTGGTTTGGC GAGAAAGATA CGACCAAAAC GAACGGGCGG 

1801 CTCAACCTTG TTTACCAGCC CGCCGCAGAA GACCGCACCC NGCTGCTTTC 

1851 CGGCGGAACA AATTTAAACG GCAACATCAC GCAAACAAAC GGCAAACTGT 

1901 TTTTCAGCGG CAGACCGACA CCGCACGCCT ACAATCATTT AGGAAGCGGG 

1951 TGGTCAAAAA TGGAAGGTAT CCCACAAGGA GAAATCGTGT GGGACAACGA 

2001 CTGGATCNAC CGCACGTTTA AAGCGGAAAA TTTCCATATT CAGGGCGGGC 

2051 AGGCGGTGAT TTCCCGCAAT GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 

2101 AGCAATCACG CCCAAGCAGT TTTTGGTGTC GCACCGCATC AAAGCCATAC 

2151 AATCTGTACA CGTTCGGACT GGACNGGTCT GACAAATTGT GTCGAANAAA 

2201 NCATTACCGA CGATAAAGTG ATTGCTTCAT TGACTAAGAC NGACNTNAGC 

2251 GGCANTGTNA GNCTNNCCNA TNACGNTNNT TNAAANCTCN CNGGGCNTGC 

2301 NNCACTNAAN GGCAATCTTA GTGCAAATGG CGATACACGT TATACAGTCA 

2351 GCCACAACGC CACCCAAAAC GGCAACCTTA GCCTCGTGGG CAATGCCCAA 

2401 GCAACATTTA ATCAAGCCAC ATTAAACGGC AACNCATCGG NTTCGGGCAA 

2451 TGCTTCATTT AATCTAAGCA ACAACGCCGC ACAAAACGGC AGTCTGACGC 

2501 TTTCCGACAA CGCTAAGGCA AACGTAAGCC ATTCCGCACT CAACGGCAAT 

2551 GTCTCCCTAG CCGATAAGGC AGTATTCCAT TTTGAAAACA GCCGCTTTAC 

2601 CGGACAACTC AGCGGCAGCA AGGANACAGC ATTACACTTA AAAGACAGCG 

2651 AATGGACGCT GCCGTCAGGC ACGGAATTAG GCAATTTAAA CCTTGACAAC 

2701 GCCACCATTA CACTCAATTC CGCCTATCGC CACGATGCTG CAGGCGCGCA 

2751 AACCGGCAGN GTGTCAGACA CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 

2801 TATCCGTTAC ACCGCCAACT TCGGTAGAAT CCCGTTTCAA CACGCTGACG 

2851 GTAAACGGCA AATTGAACNG TCAAGGAACA TTCCGCTTTA TGTCGGAACT 

2 901 CTTCGGCTAC CGAAGCGACA AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 
2951 CTTACACCTT GGCGGTCAAC AATACCGGCA ACGAACCCGT AAGCCTCGAT 
3001 CAATTGACGG TAGTGGAAGG GAAAGACAAC AAACCGCTGT CCGAAAACCT 
3051 TAATTTCACC CTGCAAAACG AACACGTCGA TGCCGGCGCG TGGCGTTACC 
3101 AACTCATCCG CAAAGACGGC GAGTTCCGCC TGCATAATCC GGTCAAAGAA 
3151 CAAGAGCTTT CCGACAAACT CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 
3201 AAAAGACAAC GCGCAAAGCC TTGACGCGCT GATTGCGGCC GGGCGCGATG 
3251 CCGCCGAAAA GACAGAAAGC GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 
3301 GAAAATGTCG GCATTATGCA GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 
3351 GGATAAAGAC AGCGCNTTGG CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 
34 01 NTACCACCGC CTTCCCCCGC GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 
34 51 CCGCAGCCCC AACCGCAACC TCAACCCCAA CCGCAGCGCG ACCTGATNAG 
3501 CCGTTATGCC AATAGCGGTT TGAGTGAATT TTCCGCCACG CTCAACAGCG 
3551 TTTTCGCCGT ACAGGACGAA TTGGACCGCG TGTTTGCCGA AGACCGCCGC 
3601 AACGCNGTTT GGACAAGCNG CATCCGGNAC ACCAAACACT ACCGTTCGCA 

3 651 AGATTTCCGC GCCTACCGCC AACAAACCGA CCTGCGCCAA ATCGGTATGC 
3701 AGAAAAACCT CGGCAGCGGG CGCGTCGGCA TCCTGTTTTC GCACAACCGG 
3751 ACCGAAAACA NCTTCGACGA CGGCATCGGC AACTCGGCAC GGCTTGCCCA 
3801 CGGCGCCGTT TTCGGGCAAT ACGGCATCGG CAGGTTCGAC ATCGGCATCA 
3851 GCACGGGCGC GGGTTTTAGC AGCGGCANTC TNTCAGACGG CATCGGAGGC 
3901 AAAATCCGCC GCCGCGTGCT GCATTACGGC ATTCAGGCAC GATACCGCGC 
3951 CGGTTTCGGC GGATTCGGCA TCGAACCGTA CATCGGCGCA ACGCGCTATT 

4 001 TCGTCCAAAA AGCGGATTAC CGCTACGAAA ACGTCAATAT CGCCACCCCC 
4 051 GGTCTTGCGT TCAACCGNTA CCGNGCGGGC ATTAAGGCAG ATTATTCATT 
4101 CAAACCGGCG CAACACATNT CCATCACNCC TTATTTNAGC CTGTCCTATA 
4151 CCGATGCCGC TTCGGGCAAA GTCCGAACAC GCGTCAATAC CGCNGTATTG 
4 201 GCTCAGGATT TCGGCAAAAC CCGCAGTGCG GAATGGGGCG TAAACGCCGA 
4 251 AATCAAAGGT TTCACGCTGT CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 
4 301 AACTGGAAGC GCAACACAGC GCGGGCATCA AATTAGGCTA CCGCTGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 652>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 

151 SHPYNGDXHM PRLHKFVTDA EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 

201 HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 
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251 MPIAGAAGDS GSPMFIYDKT NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 

301 FYDDIYRGDT HTVXFEPRSN GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ 

351 TVRLFDESLN ETDKEPVYAA GGVNQYRPRL NNGENLSFID YGNGKLILSN 

401 NINQGAGGLY FEGDFTVSPE NNETWQGAGV HISEDSTVTW KVNGVANDRL 

5 451 SKIGKGTLHV QAKGENQGSI SVGDGTVILD QQADDKGKKQ AFSEIGLXSG 

501 RGTVQLNADN QFNPDKLYFG FRGGRLDLNG HSLSFHRIQN TDEGAMIXXH 

551 NATTTSTVTI TGNESITQPS GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 

601 LNLVYQPAAE DRTXLLSGGT NLNGNITQTN GKLFFSGRPT PHAYNHLGSG 

651 WSKMEGIPQG EIVWDNDWIX RTFKAENFHI QGGQAVISRN VAKVEGDXHL 

JO 701 SNHAQAVFGV APHQSHTICT RSDWTGLTNC VEXXITDDKV IASLTKTDXS 

751 GXVXLXXXXX XXLXGXAXLX GNLSANGDTR YTVSHNATQN GNLSLVGNAQ 

801 ATFNQATLNG NXSXSGNASF NLSNNAAQNG SLTLSDNAKA NVSHSALNGN 

851 VSLADKAVFH FENSRFTGQL SGSKXTALHL KDSEWTLPSG TELGNLNLDN 

901 ATITLNSAYR HDAAGAQTGX VSDTPRRRSR RSLLSVTPPT SVESRFNTLT 

15 951 VNGKLNXQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN NTGNEPVSLD 

1001 QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE 

1051 QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAAEKTES VAEPARXAGG 

1101 ENVGIMQAEE EKKRVQADKD SALAKQREAE TRPXTTAFPR ARXARRDLPQ 

1151 PQPQPQPQPQ PQRDLXSRYA NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 

20 1201 NAVWTSXIRX TKHYRSQDFR AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR 

1251 TENXFDDGIG NSARLAHGAV FGQYGIGRFD IGISTGAGFS SGXLSDGIGG 

1301 KIRRRVLHYG IQARYRAGFG GFGIEPYIGA TRYFVQKADY RYENVNIATP 

1351 GLAFNRYRAG IKADYSFKPA QHXSITPYXS LSYTDAASGK VRTRVNTAVL 

1401 AQDFGKTRSA EWGVNAEIKG FTLSXHAAAA KGPQLEAQHS AGIKLGYRW* 

25 A transmembrane region is underlined. 

ORF1-1 shows 86.3% identity over a 1462aa overlap with ORFla: 

10 20 30 40 50 60 

orf la . pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
I I I I I I I I I I I I t I I I M I I I I I I I I I M i I M I I I I I I I f I I I t t I I I I 1 I I t I I I I I I 
30 orf 1-1 MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f la . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSVVSRNGVAALVGDQYIVSVAHNGGYN 
35 I I I II I I I I t It I I I I II I i I II II 1 I I I M I I I I i I I I I I I I I I I I I II I I I I 1 I I I II 

orf 1-1 KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 179 

40 orf la . pep NVDFGAEGXN P DQHRFS YQI VKRNN YKP DNS -HP YNGDXHMPRLHKFVT DAE PVEMTS DM 

I I I I I I I I I I I I I I I : I : I I I I II I I :: I I I : I I I I I I I I 111- 1 I I I I I I I I I I 
orf 1-1 NVDFGAEGRNPDQHR FT YKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVT DAE PVEMTS YM 

130 140 150 160 170 180 

45 180 190 200 210 220 230 

orf la. pep RGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDL--SYSGA WLIGGNTHMQGWGNN 

I | | : : : | | : M I I I : I : : I I I I : I : : : II I I I : I I I I I : : : : 

orf 1-1 DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

190 200 210 220 230 240 

50 

240 250 260 270 280 290 

orf la. pep GVXSLSGD-VRHANDYGPMPIAGAAGDSGSPMFXYDKTNNKWLLNGVLQTGYPYSGRENG 
I : : I : : : : : I : II : I : I : I I I I I I I I I I I : : I I I : I I I I I M I I I : II 
orf 1-1 GTVNLGSEKIKHS-PYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNG 
55 250 260 270 280 290 

300 310 320 330 340 350 

orf la . pep FQLIRKDWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQT 
I I I : I I I I I t I : I : M I I : I : I I I : II : : I I : : : M II I : : : I : I I : I I : : I 
60 orf 1-1 FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 

300 310 320 330 340 350 

360 370 380 390 400 410 

orf la . pep VRLFDESLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLY 
65 I : I I : I I : I I : II I I I I I I I I : I I I I II I I I I : II I I I : I : I M : : I II I I I I I I I 

orf 1-1 VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 
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orfla.pep 
orfl-1 



360 370 380 390 400 410 

420 430 440 450 460 470 

FEGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 
| : | t I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 1 1 I I I I I 1 I I I M i I I 
FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 
420 430 440 450 460 470 



480 490 500 510 520 530 

1 0 orf la pep SVGDGTVILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I ! I I I I M I M I I I II I I I I I I ! I I M I 1 1 I I 
orfl-1 SVGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
480 490 500 510 520 530 

15 540 550 560 570 580 590 

orf la pep HSLSFHRIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFG 

I I I I I I I I I I I I! I I I I II I I I II I I I I : : I : I I I : : I I I I I I I I I I 

orfl-1 HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIAT-TGNN-NSLDSKKEIAYWGWFG 
540 550 560 570 580 590 

20 

600 610 620 630 640 650 

orf la . pep EKDTTKTNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

( I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I : : 
orfl-1 EKDTTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDH 
25 600 610 620 630 640 650 

660 670 680 690 700 710 

orfla.pep WSKMEGIPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGV 
II: | I I I : I I I I I I M I ! I I I I I I I I : I : I I I I I : I I I I I I I I I I I I I I I I I I I I I 
30 orfl-1 WSQKEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGV 

660 670 680 690 700 710 

720 730 740 750 760 770 

orf la . pep APHQSHTICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLX 
35 I I II II II I I II I I I I I I I I II : I I I I I I I I I I I I I I I I I I 1:1 I : I 

orfl-1 APHQSHT I CTRS DWTGLTNCVEKT ITDDKVIASLTKTDI SGNVDLADHAHLNLTGLATLN 

720 730 740 750 760 770 

780 790 800 810 820 830 

40 orf la . pep GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNG 

I I I I I I II I I I I I I I I 1 I I I I I I I I I I I I II II I I I I I I I I : I I I I I I I I I I :: I : I I I 
orfl-1 GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNG 
780 790 800 810 820 830 

45 840 850 860 870 880 890 

orf la . pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSG 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I : I I I I I I : I I : I I I I I I I I I I I I I I I I 
orfl-1 SLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSG 
840 850 860 870 880 890 

50 

900 910 920 9.30 940 
orf la . pep TE LGN LNL DN AT I T LN S A YRH D AAG AQT GX V S DT P RRR S RR S LLSVTPPTSVESRFN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I II M 1 II ! I! I 
orfl-1 TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFN 

55 900 910 920 930 940 950 

950 960 970 980 990 1000 

orf la . pep TLTVNGKLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTWEG 

II I I I I II I I I I I I I I I I M I II II I 1 I 1 I I I I I I I I I I I I i I I I II I : ! I : I II M I I 
60 orfl-1 TLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEG 

960 970 980 990 1000 1010 

1010 1020 1030 1040 1050 1060 

orf la . pep KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 
65 | | | | | | | | || | | | | || | | | | | | | | | | | | | M | | | | | | | M I I I I II I I I I I I I II I I I I 1 

orfl-1 KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 

1020 1030 1040 1050 1060 1070 

1070 1080 1090 1100 1110 1120 

70 orf la . pep KDNAQSLDALI7VAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQR 
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| | | I | | I I t t I I I I t I I : I I I ! I I I I I I I 11111111111111111111111:111111 
o r f 1 - 1 KDNAQS LDAL I AAGRDAVEKTE SVAE PARQAGGEN VGIMQAEEEKKRVQADKDTALAKQR 

1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

orfla pep eaetrpxttafprarxarrdlpqpqpqpqpqpqpqrdlxsryansglsefsatlnsvfav 

| | | I | | | ! | M I II II II I i I I i i t I I I I I I I I I I t I I I I I I I I I I I I I I I I I < 
orf l_l EAETRPATTAFPRARRARRDLPQLQPQPQPQP — QRDLISRYANSGLSEFSATLNSVFAV 

1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orfla pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
| | i | | t I t I I I I I I I t I t I II I I I I I I I t I I I I M I I I I It I I I I I I M I I I I I I I I I 
orfl-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orfla . pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 
I I I I I I : I I I M I I I I I I I I I I I I I M I I II 1111:1111111 I I I I II I I I I I I I I 
orfl-1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

orfla . pep H YG I QARYRAGFGGFG I E P Y I GAT R Y FVQKAD YRYENVN I AT PGLAFNR YRAG I KAD Y S F 

I I I I I I I M I I I M I I I I I : I I I I I I I I I I I I I I I II I I I I I I I I I M I I I II I I I I I I I 
orfl-1 HYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

orfla. pep KPAQHXSITPYXSLSYTDAASGBCVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 
I I I I I I II I I 1 I i I II I i I I I I II I I I I I I I I I I I I I 1 I I I I I I i I I I I I I I I I I I 1 
orfl-1 K PAQH ISITPYLSLSYT DAAS GKVRT RVN T AV LAQ D FGKTR S AEWGVNAE I KG FT LS LHA 

1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orfla. pep AAAKG PQLEAQH SAG IKLGYRWX 

I I I I I I I I I I I I I I I I I II I I I I 
orfl-1 AAAKG PQLE AQHS AG I KLG YRWX 

1440 1450 

Homology with adhesion and penetration protein hap precursor of H.influenzae (accession number P45387) 
Amino acids 23-423 of ORF1 show 59% aa identity with hap protein in 450aa overlap: 

FXAAYLAI CL S FG I L PQAWAGHT Y FG IN YQ Y YRD FAENKGKFAVGAKD I E V YNKKGE LVG 82 
F +L C+S GI QAWAGHT Y FG I + YQ Y YRD FAENKGK F VGAK+IEVYNK+G+LVG 
FRLNFLTACVSLGIASQAWAGHTYFGIDYQYYRDFAENKGKFTVGAKNIEVYNKEGQLVG 65 

KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 
SMTKAPMI DFS WSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 



KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 



orfl 


23 


hap 


6 


orfl 


83 


hap 


66 


orfl 


143 


hap 


125 


orfl 


203 


hap 


185 


orfl 


223 


hap 


245 


orfl 


278 


hap 


305 



222 



QYWR+D+DE N SSY+++ 



-SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRKDWFYDEIFAGDTHSVF 277 
SGS PMFI YDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 



Y P NG YSF +N+GTGK+ + + + + TV+LFN SL++TA+E V A 
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orfl 335 AGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFTV-SPENNETWQGA 393 

A G N Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V +NN TWQGA 
hap 364 AAG YN I YQPRME YGKN I YLGDQGKGT LT I ENN INQGAGGL Y FEGN FWKGKQNN IT WQG A 4 23 

5 orfl 394 GVHISEDSTVTWKVNGVANDRLSKIGKGTL 423 

GV I +D+TV WKV+ NDRLSKIG GTL 
hap 424 GVSIGQDATVEWKVHNPENDRLSKIGIGTL 453 

Amino acids 715-101 1 of ORF1 show 50% aa identity with hap protein in 258aa overlap: 

Orfl 4 1 DTRYTVSHNATQ-NGNXSLVXNAQATFNQ-ATLNGNTSASGNASFNLSDHAVQNGSLTLS 98 
10 DT+ s TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 

hap 733 DTKVINSIPITQINGSINLTNNATVNIHGLAKLNGNVTLIDHSQFTLSNNATQTGNIKLS 7 92 

orfl 99 GNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGN 158 
+A A V+++ LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 
15 hap 793 NHANATVNNATLNGNVHLTDSAQFSLKNSHFWHQIQGDKDTTVTLENATWTMPSDTTLQN 852 

orfl 159 LNLDNATITLNSAYRHDAAGAQTGSATDAPXXXXXXXXXXLLXVTPPTSVESRFNTLTVN 218 

L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 

hap 853 LTLNNSTVTLNSAY SASSNNAPRHRRS LETETTPTSAEHRFNTLTVN 899 



20 



40 



55 



orfl 219 GKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKP 278 

GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 
hap 900 GKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYTLSVRNTGKEPVTLEQLTLIESLDNKP 959 



25 orfl 279 LSENLNFTLQNEHVDAGA 296 

LS+ L FTL+N+HVDAGA 
hap 960 LSDKLKFTLENDHVDAGA 977 

Amino acids 1 192-1450 of ORF1 show 41% aa identity with hap protein in 259aa overlap: 

Orfl 1 L DRV FAE DRRN AVWT SGIRDTKHYRSQD FRA YRQQT D LRQ I GMQKN L G S GR VG I L F S HNR 60 
30 LDR+F + -H+AVWT4- 4-D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 

hap 1135 LDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQKTNLRQIGVQKALANGRIGAVFSHSR 1194 

orfl 61 TENT FDDG I GN SARLAHGAVFGQ YG I DRFYXXXXXXXXXXXXXXXXX I GXKXRRRVLHYG 120 
++NTFD+ +NAL+FQY KR+ ++YG 

35 hap 1195 SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKAINYG 1254 



orfl 121 IQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPA 180 

+ A Y+ G GI+P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 

hap 1255 VNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSLAFNRYNAGIRVDYTFTPT 1314 

orfl 181 QHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAA 240 

+IS+ PY ++Y D ++ V+T VN VL Q FG+ E G+ AEI F +S + + 

hap 1315 DNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEVGLKAEILHFQISAFISKS 137 4 



45 orfl 241 KG PQLE AQH SAG I KLGYRW 259 

+G QL Q + G+KLGYRW 
hap 1375 QGSQLGKQQNVGVKLGYRW 1393 

Homology with a predicted ORF from N. gonorrhoeae 
50 The blocks of ORF1 show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N. gonorrhoeae: 



orf 1 . pep MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 60 

I II f I I I t t I I I I t t I t I I t I I I I I I I I I I I I I I t I t I I I I I I I i I II i I t I I I I I I 

orf Ing MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 60 

orfl . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 120 

I I I I I ) I I I I I i I I I I I II I I I I I ! I t I 1 I II 1 I I I I I I I I I I I : I M I I I I I I I I II I 

orf lng KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 120 



60 orf 1 .pep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 

I I I I I I I I I II I : I : I I I I I I I I I I I : 1 I I I II M I I I I 1 I I I I I I I I II I I 1 I 
orf lng NVDFGAEGSN-PDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 17 9 
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10 



15 



20 



.« 25 



S 30 



35 



40 



45 



50 



55 



60 



orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 -pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 .pep 
orf lng 



M DGRKY I DQNN Y P DRVRIGAGRQ YWRS DE DE PNNRE S S YH I AS 

111 || | I : I I 1 I I 1 I I I I I I I I I I I I I I I I I ! M I i i I I i 

MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSG 

GSFMFIYDA QKQKWLINGVLOTGNFYIGKSNG 

1 I | | j 1 I I I I I I 1 I I i I I I i I t ! i I i I I I 1 I I 
GGTVNLGSEKIKH5PY GFLPTGGSFGDSGSPMFIYDA QKQKWLINGVLOTGNPYIGKSNG 



223 



239 



255 



289 



FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 315 

TTTIHIIMIIIII1MIIIIIII:1IIM 111:11 Ul I I = I ICi It! Mill! 
FOLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRT 359 

VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 375 
IlillllllltlMllllllMMIIiilMllllilliCIIIiiiitiMIIMIM! 
VQL FN V S L S E TARE P V YHAAGG VN S YR PRLNNGEN I S F I DKGKGE L I LT SN I N QG AGGL Y 

FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 422 
I : I : I I I I I : I I 1 I I I ! M I M I : ! 1 i I i I i I I I i I I f I M 11 M I 

FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 47 9 

// 

DKVT AS LTKT D I SGN VDLADHAHLNLTGLA 7 4 4 
III I I I : I I I : t I I : I I I I I i II i I I I i 
FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 774 

TLNGNLSANGDTR-YTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHA 803 

1:1111 ::::!! : i I I I 1 I i Mi I I I I I! I I i I I I I I I ! I I I I I I II I : : I 
TFNGNL-VQAETRTIRLRANATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNA 833 

VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 863 
I M I I I I I I I I I ! ! II ! I I I I I 1 I I 1 I I I I I I M I : I i i i i : I I I i i I i I I M I I I I I i 
VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 8 93 

LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 923 

I I I I : I f I I I I M I I I I I I I I I I I I I i I I I I I I I i : I I 1 I I II II I II 111111:1 

L P S GTE LGN LN L DN AT I T LN S A Y RH D AAG AQT G S AAD APRRR SRR S LLSVTPPTSAE 950 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 983 

I I I \ I II I I I I I I I I I I I I I I I I i I I I I I I I II I i I ! I I I I I I I I II I I I I I : I i I I I I 
SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 1010 

VVEGKDNKPLSENLNFTLQNEHVDAGAW 1011 
I I I I I I I I I I I I I I I I II I I I ) I I I I I 

VVEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 1070 

// 

LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 
M i I i t I I I I II I I i I I I I I I 1 I I I I I I I I 
PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1239 

AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 1271 
I I I I I I I I II I I I t I I I II t I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I II 
AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 1299 

IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 
I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I 
IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1359 

R YEN VN I AT PGLAFNRYRAG I KADY S FKPAQH I S I T P YLSLS YT DAASGKVRTRVNT AVL 1391 
I I I I I I I I I I 1 II II I I II I II I I I I I I I t I I I I I I I ! I I I I II I I I I I I I I I II I I I I I 
RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1419 

AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 40 
I I I I I I I I I I I I I I I I I I I I I II I I I I 1 I I I I I I 1 I I I I I II I I I I I I I 
AQD FGKTRS AE WG VNAE I KG FTLS LHAAAAKG PQLE AQHS AG I KLG YRW 14 68 



65 The complete length ORF lng nucleotide sequence was identified <SEQ ID 653>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 
51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 
101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 
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151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 

301 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

5 351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 

4 01 ACCGCTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA AGCAGGGACT 

4 51 AACGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCACAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 

551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 

10 601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

15 851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACATCAAAA TGGGAAATAC TTTTTTAACG 

1001 ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 

1051 CTACCTTATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

20 1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 

1201 GGAAAAGGTG AATTGATACT TACCAGCAAC ATCAACCAAG GCGCGGGCGG 

1251 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 

25 1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

1401 GCTGGTTCAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 

14 51 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

30 1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

17 01 TACCATTACA GGCAATAAAG AT AT TACT AC AACCGGCAAT AACAACAACT 

17 51 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 GCAACCAAAA CGAACGGGCG GCTCAATCTG AATTACCAAC CGGAAGAAGC 

35 1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 

1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 

2001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 

40 2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAGTTG TACCGAAAAA AC CAT T AC CG ACGATAAAGT GATTGCTTCA 

2251 TTGAGCAAGA CCGACATCAG AGGCAATGTC AGCCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 

45 2351 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

24 51 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

50 2601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 

55 2851 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 

2901 ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA CCGCAGCGGC AAATTGAAGC 

2 951 TGGCGGAAAG TTCCGAAGGC ACTTACACCT TGGCTGTCAA CAATACCGGC 

3001 AACGAACCCG TAAGTCTCGA GCAATTGACG GTAGTGGAAG GAAAAGACAA 

3051 CACACCGCTG TCCGAAAATC TTAATTTCAC CCTGCaaaAc gaacacgtcg 

60 3101 atgccggcgc atggCGTTAT CAGCTTATCC gcaaagacgG CGAGTTCCgc 

3151 CTGCATAATC CGGTCAAAGA ACAAGAGCTT TCCGACAAAC TCGGCAAGgc 

3201 gggagaaACA GAggccgccT TGACGGCAAA ACAGGCacaA CTTGCCGCCA 

3251 AAcaacaggc ggaaaAAGAC AACgcgcaaa gccttgAcgc gctgattgcg 

3301 gCcgggcgca atgccaccga AAAGGCAgaa agtgttgccg aaccgGCCCG 

65 3351 GCAGGCAGGC GGGGAAAAtg ccgGCATTAT GCAGGCGGAG GAAGAGAAAA 

34 01 AACGGGTGCA GGCGGATAAA GACACCGCCT TGGCGAAACA GCGCGAAGCG 

34 51 GAAACCCGGC CGGCTACCAC CGCCTTCCCC CGCGCCCGCC GCGCCCGCCG 

3501 GGATTTGCCG CAACCGCAGC CCCAACCGCA ACCCCAACCG CAGCGCGACC 

3551 TGATCAGCCG TTATGCCAAT AGCGGTTTGA GTGAATTTTC CGCCACGCTC 

70 3601 AACAGCGTTT TCGCCGTACA GGACGAATTG GACCGCGTGT TTGCCGAAGA 
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3651 CCGCCGCAAC GCCGTTTGGA CAAGCGGCAT CCGGGACACC AAACACTACC 

3701 GTTCGCAAGA TTTCCGCGCC TACCGCCAAC AAACCGACCT GCGCCAAATC 

37 51 GGTATGCAGA AAAACCTCGG CAGCGGGCGC GTCGGCATCC TGTTTTCGCA 

3801 CAACCGGACC GGAAACACCT TCGACGACGG CATCGGCAAC TCGGCACGGC 

3851 TTGCCCACGG TGCCGTTTTC GGGCAATACG GCATCGGCAG GTTCGACATC 

3901 GGCATCAGCG CGGGCGCGGG TTTTAGTAGC GGCAGCCTTT CAGACGGCAT 

3951 CAGAGGCAAA ATCCGCCGCC GCGTGCTGCA TTACGGCATT CAGGCAAGAT 

4 001 ACCGCGCAGG TTTCGGCGGA TTCGGCATCG AACCGCACAT CGGCGCAACG 

4051 CGCTATTTCG TCCAAAAAGC GGATTACCGA TACGAAAACG TCAATATCGC 

4101 CACCCCGGGC CTTGCATTCA ACCGCTACCG CGCGGGCATT AAGGCAGATT 

4151 ATTCATTCAA ACCGGCGCAA CACATTTCCA TCACGCCTTA TTTGAGCCTG 

4201 TCCTATACCG ATGCCGCTTC CGGCAAAGTC CGAACGCGCG TCAATACCGC 

4251 CGTATTGGCG CAGGATTTCG GCAAAACCCG CAGTGCGGAA TGGGGCGTAA 

4301 ACGCCGAAAT CAAAGGTTTC ACGCTGTCCC TCCACGCTGC CGCCGCCAAG 

4351 GGGCCGCAAT TGGAAGCGCA GCACAGCGCG GGCATCAAAT TAGGCTACCG 

4 401 CTGGTAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGILPQA RAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALAGDQYI VSVAHNGGYN NVDFGAEGSN PDQHRFSYQI VKRNNYKAGT 

151 NGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGWKYADLNK YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPY GFLPT GGSFGDSGSP MFIYDAQ KQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPHQNGKY FFNDNNNGAG KIDAKHKHYS 

351 LPYRLKTRTV QLFNVSLSET ARE P VYHAAG GVNSYRPRLN NGENISFIDK 

401 GKGELILTSN INQGAGGLYF EGNFTVSPKN NETWQGAGVH ISDGSTVTWK 

451 VNGVANDRLS KIGKGTLLVQ AKGENQGSVS VGDGKVILDQ QADDQGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDITTTGN NNNLDSKKEI AYNGWFGEKD 

601 ATKTNGGLNL NYPPEEADRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLGSGWSK MEGIPQGEIV WDNDWIDRTF KAENFHIQGG QAWSRNVAK 

701 VEGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTSCTEK TITDDKVIAS 

751 LSKTDVRGNV SLADHAHLNL TGLATFNGNL VQAETRTIRL RANATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASDNASFNLS NNAVQNGSLT LSDNAKANVS 

851 HSALNGNVSL ADKAVFHFEN SRFTGKISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSAAD APRRRSRRSL LSVTPPTSAE 

951 SRFNTLTVNG KLNGQGTFRF MSELFGYRSG KLKLAESSEG TYTLAVNNTG 

1001 NEPVSLEQLT VVEGKDNTPL SENLNFTLQN EHVDAGAWRY QLIRKDGEFR 

1051 LHNPVKEQEL SDKLGKAGET EAALTAKQAQ LAAKQQAEKD NAQSLDALIA 

1101 AGRNATEKAE S VAE PARQAG GENAGIMQAE EEKKRVQADK DTALAKQREA 

1151 ETRPATTAFP RARRARRDLP QPQPQPQPQP QRDLISRYAN SGLSEFSATL 

1201 NSVFAVQDEL DRVFAEDRRN AVWTSGIRDT KHYRSQDFRA YRQQTDLRQI 

1251 GMQKNLGSGR VGILFSHNRT GNTFDDGIGN SARLAHGAVF GQYGIGRFDI 

1301 GISAGAGFSS GSLSDGIRGK IRRRVLHYGI QARYRAGFGG FGIEPHIGAT 

1351 RYFVQKADYR YENVNIATPG LAFNRYRAGI KADYSFKPAQ HISITPYLSL 

1401 SYTDAASGKV RTRVNTAVLA QDFGKTRSAE WGVNAEIKGF TLSLHAAAAK 

1451 GPQLEAQHSA GIKLGYRW* 

Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



ORF1-1 and ORFlng show 93.7% identity in 1471 aa overlap: 

10 20 30 40 50 60 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
i I I I < M I I I I I I I I I I t I I I I I I I I I I ( I i I I i I i I ( t i I I t I II I I 1 I I I I f I I I t I 
MKTTDKRTTETHRKAPKTGRIRFSPAYIiAICLSFGILPQARAGHTYFGINYQYYRDFAEN 
10 20 30 40 50 60 

70 80 90 100 110 120 

KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I DFSWSRNGVAALVGDQY I VSVAHNGGYN 
I I I I I I I I I I I I I ! I I M I I I I 1 I I I I I ! I I I I ! I I I I 1 ! II I I : I I 1 I I M M I I I I I i 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYI VSVAHNGGYN 
70 80 90 100 110 120 



orf 1-1 .pep 
orf lng-1 

orf 1-1 . pep 
orf lng-1 
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10 



130 140 150 160 170 180 

orf 1-1 pep NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
I I I I I I I I I I I 1 I I I : ! : I I I 1 I I I I I I I : ! 1 ! ! 1 ! I I I ! I I I I I I i i I I I I I f i i i I I 
orflng-1 NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1-1 . pep DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
I i ) | i | : I | f I I I I i i i II i 1 t I I I I M I I I I I i I I I I I i I I I I I I I I I I I I I I I I t 
orflng-1 DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

190 200 210 220 230 240 



15 



250 260 270 280 290 300 

orfl-l.pep GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
I I I I I I I I I I I I I j I 1 I 1 I I i I I I I I I 1 I I 1 I II I I I I i ! I I I I I I I I I I I I I I I I I I I I 
orflng-1 GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

250 260 270 280 290 300 



20 



25 



30 



35 



310 320 330 340 350 360 

orfl-l.pep QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 
I I I I t I II I II I II I I I II t I I I I : I II I I I I I : ! I I : I I I : II I : I ill I I I M I I 
orflng-1 QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orfl-l.pep QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYF 
I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I : 1 I I I I i I I I I I M I II I I I I 
orflng-1 QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLYF 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 1-1 . pep QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 
: I : I I I I I : I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I II ! M II I I I I I I I I I I : I 
orflng-1 EGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSVS 

430 440 450 460 470 480 



40 



45 



50 



55 



60 



65 



70 



490 500 510 520 530 540 

orfl-l.pep VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 
I I I I I I M II I I I : I I I I II I 1 I f I II I I I I I I I I I I I I I I I I ! M I I I I I I I I I I I I I 
orflng-1 VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 1-1 . pep SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 
I I I I I M I I I I I I II II I I I I I I I I I I I M I I I I I : I I I I I I : I I I I I I I I I I I M I I I I 
orflng-1 SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITTTGNNNNLDSKKEIAYNGWFGEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 1-1 . pep TTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 
: I I I I II I I I IM I I 1 I I I I I I 1 I I I I I I I I I I M I I I I I I | | I I | ! t I I : : ! I : 
orflng-1 ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 1-1 . pep KEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGVAPH 
I I I I = I I I I I I M I I : II I I I I ( I : I : I I I I I I I t t I I I I : I I II I I ! I I I I I I I I I I | 
orflng-1 MEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGVAPH 

670 680 690 700 710 720 

730 740 750 760 770 780 

orfl-l.pep QSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLNGNL 
I I M I I I I M I I I I I : I : I I I I I I | | | I | | | : | | | | I I I : I I | | | I I I I I I I I I I I I I | 
orflng-1 QSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 

790 800 810 820 830 840 

orf 1-1 . pep SANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 
>l:f ll:MI::||||||||||!tltilllliMIMIIIIi I I i I I I I : : I I I I I | | | 
orflng-1 SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 



CHIR-0160 (356.001) 



-392- 



PATENT 



790 



800 



810 



820 



830 



840 



10 



15 



20 



25 



30 



35 



orf 1-1 .pep 
orf lng-1 

orf 1-1 .pep 
orf lng-1 

orf 1-1 .pep 
orf lng-1 

orf 1-1 .pep 
orf lng-1 

orf 1-1 . pep 
orf lng-1 

orf 1-1 .pep 
orf lng-1 



850 860 870 880 890 900 

LSGNAKANVSHSALNGNVSLADKAVFHFESSRETGQISGGKDTALHLKDSEWTLPSGTEL 
|| | | i | M | ! I i j I I I i I I I I I I I 1 1 i I : ! I M 1 : I I t I I I I I I ! I I M I I I 1 1 M II I 
LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 

850 860 870 880 890 900 

910 920 930 940 950 960 

GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 

I | I I | M I I M M 1 I 1 I i i i t I I I I I I I : t II M II I I I I t I I I III I : ! H I t I I I 

GN LN L DN AT I T LN S AYRH DAAGAQTG S AADAPRRRS R RSLLSVTPPTSAESRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

VNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDN 

M I I M M I I I i I M I I i I I I I I I 1 I I I I I I I I I t I I I M I I I I I : I I I I I I I I I I I I I 
VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 
960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 
KPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKA 

j f I I I I I 11 II I M I I I I I I II I I I I I I M I I I I I I I II I I I I I I I I II 
TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 

EAKKQAEKDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQ 

I I : I I I I 1 I II I II I I I I I 1 I • I : I I : I I I I I I I I I I II I I : i I I I I M I M M I 
QAQLAAKQQAEKDNAQS LDALI AAGRNATEKAE SVAE PARQAGGENAGIMQAEEEKKRVQ 
1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 
i I II I I I I I I I I I I I I I M I I I I II I I I I II I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 
1140 1150 1160 1170 1180 1190 



40 



45 



50 



55 



1190 1200 1210 1220 1230 1240 

orf 1-1 . pep ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
I M I I I I I I I I I I I I I II I I 1 I I I I M I I I I I I I I I I I I I II II I I I I I I i II I I I I I I I 
orf lng-1 AT LN S V FAVQ DE L DRV FAE DRRN A VWT S G I R D T KH YR S QD FRA YRQQT D LRQ I GMQKN LG 

1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orf 1-1. pep SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 
I I I I I I I I I I I I I I I I I I I II I I I I I II I I I M I I I I II I I I II I II I I I I II I I I I 
orf lng-1 SGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

orf 1-1 . pep GGK I RRRVLH YG I QAR YRAG FGG FG I E PH IGATR Y FVQKAD YR YEN VN I AT PG LAFN RYR 

M I I I I I I I I I I I I I I I I I I M I I I I I I I I II II I I I I I I I I II I I II I M I I I I I I II 
orf lng-1 RGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
1320 1330 1340 1350 1360 1370 



60 



1370 1380 1390 1400 1410 1420 

orf 1-1 .pep AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf lng-1 AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
1380 1390 1400 1410 1420 1430 



65 



1430 1440 1450 

orf 1-1 .pep KG FTL S LHAAAAKG PQLE AQHS AG I KLG YRWX 

M I I II I I I I M I I I I I I I I I I I I I I I I I I M 
orf lng-1 KGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 
1440 1450 1460 



In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 
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SCORES Initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap 

10 20 30 40 50 60 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

t : | : | : 1 : | | : t I I t I t t I t I : 1 I I i I I I ! I I 
MKKT V FR LN FLT AC I S LG I V SQ AW AG HT Y FG I D YQ Y Y R D FAEN 
10 20 30 40 



5 orflng-l.pep 
p45387 



iq 70 80 90 100 110 120 

orflng-1 pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 
I t I I : I I I : : I : I 1 I I : I : I I I I ! I I I I M M M I I I I 11 III : : I I I I I I I I I II: 
p45387 KGKFTVGAQNIKVYNKQGQLVGTSMTKAPMIDFSWSRNGVAALVENQYIVSVAHNVGYT 
50 60 70 80 90 100 

15 

130 140 150 160 170 180 

orflng-1 . pep NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
: | | | | I I I : II I I I I I : I : I I I f I M I I Mi Ml I I I I I I I I : t I : : I M I 
p45387 DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 
20 110 120 130 140 150 160 

190 200 210 220 230 240 

orflng-1 . pep DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
: t |:| : I I I : I It I I : I I I : I I : I : I : ■* : : I : I I : I : : I I I I I : I = 

25 p45387 NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD QVAGAYHYLTAGNTHNQRGAGN 

170 180 190 200 210 

250 260 270 280 290 300 

orflng-1 . pep GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
30 I II:: I : ! ! I I : M I I I I I 1 II I I I 1 : Ml II I I I : I : i I i : I I i M 

p45387 GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 
220 230 240 250 260 270 

310 320 330 340 350 360 

35 orflng-l.pep QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 

t I I I I : : I I t I I I I : : I I t I : : ! : I I I : I I : : I : : I : 

p4 5387 QLVRKS YF- DE I FERDLHTSLYTRAGNGVYT I SGNDNGQGS I TQKS GIPSEIK I 

280 290 300 310 320 



40 370 380 390 400 410 419 

orflng-1 . pep QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 
I |:|| : : I : : 1 I I I I I I I I I : : i : I : : I I I : : I : I I I f I I I I I 

p4 5387 TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 
330 340 350 360 370 380 

45 

420 430 440 450 460 470 479 

orflng-1 . pep FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 
t t I I I I I I I : : I : ! I It I I : I : I : : I I i I 1 I I I I I : I t I I I I t I I t I I I t I I I I : I I : 
p4 5387 FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKGSI 
50 390 400 410 420 430 440 



480 490 500 510 520 530 539 

orflng-1 . pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
I I I I M f i I : I M I I I I : I I I I I I I I I I I I i I I I I I ( i : I I : I I : I I I I I I I M I I I I 
55 p45387 SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 

450 460 470 480 490 500 



540 550 560 570 580 590 

orflng-1 . pep HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT-TGNN-NKLDSKKEIAYNGWFG 
60 t I I : I : I M I II I I II I II I I : : : I I I I I I : : I : : I I I I : I I : I I I I t I I I I I 

p4 5387 HSLTFKRIQNTDEGAMIVNHNTTQAANVTITGNESIVLPNGNNINKLDYRKEIAYNGWFG 

510 520 530 540 550 560 



600 610 620 630 640 650 

65 orflng-1 . pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

I I : ! I M M I I : I ( I II I I I I t I t I : I : I I I I : M i II I I I i I I I I I I I I : : 
p4 5387 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 
570 580 590 600 610 620 



70 



660 670 680 690 700 710 
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10 



15 



20 



orf lng-1 .pep 
p45387 



orf lng-1 .pep 
p45387 



orf lng-1 .pep 
p45387 



orf lng-1 . pep 
p45387 



WSKMEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGV 

M : M M I M M M M M M M M M I M M I M M M M M : : M : ! : I I : I : I : I I I 
WSEMEGIFQGEIVWDHDWINRTFKAENFQIKGGSAWSRNVSSIEGNWTVSNNANATFGV 

630 640 650 660 670 680 

720 730 740 750 760 770 

APHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 

: I : 1 : : I 1 i t t t I I I I I I : i : : I I It t M I I : ! I : : : I : I : I ! : t II I I 
VPNQQNTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 

690 700 710 720 730 740 

780 790 800 810 820 830 

GNLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNG 
||:: : : : : : I : 11 I I I : I 1 

GNVTL TNHSQFTLSNNATQIG 

750 "760 770 

840 850 860 870 880 890 

SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 
: : MM: MM:: II I M |: M I I : : I M M MM I M : M : : I M M 
NIRLSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 
780 790 800 810 820 830 



25 



900 910 920 930 940 950 

orf lng-1 .pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 

I I M M : M M M M M : : M : : I M M I : I Mill M M M 

p45387 TTLQNLTLNNSTITLNSAY SASSNNTPRRRS LETETTPTSAEHRFNTLT 

840 850 860 ' 870 



30 



35 



40 



960 970 980 990 1000 1010 

orf lng-1 . pep VNGKLWGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTVVEGKDN 
M II M M M M I I M M M I M M : : M I I Ml M M M : I I'M M I M I M 
p4 5387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 
880 890 900 910 920 930 

1020 1030 1040 1050 1060 1070 

orf lng-1. pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
I I I : : I : I I I : I : I I I I II I M M : : I II M I M M I II M : I : I : : I : I M 
p45387 QPLSDKLKFTLENDHVDAGALRYKLVKNDGEFRLHNPIKEQELHNDLVRAEQAERTLEAK 
940 950 960 970 980 990 



45 



1080 1090 1100 1110 1120 1130 

orf lng-1 .pep QAQ L AAKQ Q AE KDNAQSLDALI AAGRN AT - E KAE S VAE P ARQ AGGEN AG I MQ AE E E KKRV 
M : : M M : : : M I II : : : : : I MM : I : : : : : I : I 
p4 5387 QVEPTAKTQTGEPKVRSRRAARAAFPDTLPDQSLLNALEAKQAE-LTAETQKSKAKTKKV 
1000 1010 1020 1030 1040 1050 



50 



1140 1150 1160 1170 1180 1190 

orf lng-1, pep QADK DTALAKQREAETRPATTAFPRARRARRD-LPQPQPQPQPQPQRDLISRYANSG 

: : : : I I : I : : :::::{ I I : : I : I : I II M M I M 

p45387 RSKRAVFS DPLLDQSLFALEAALE VI DAPQQSEKDRLAQEEAEKQ-RKQKDL I SRYSNSA 

1060 1070 1080 1090 1100 1110 



55 



60 



65 



70 



1200 1210 1220 1230 1240 1250 

orf lng-1 . pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 
M M I I M M : : : II M M M M : : : : I II M : I : : I I : M I M II M M I M 
p45387 LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 
1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 1310 

orf lng-1 .pep MQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 
Ml M : I M I : I I M M I II M : I I I : : M I I I : : : M : : I M : M : 
p45387 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
1180 1190 1200 1210 1220 1230 

1320 1330 1340 1350 1360 1370 

orf lng-1 . pep SLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGL 
: : : : II : M : : : M : : I I : : I : M : I : M : : II M : : : I : Ml : MM 

P4 5387 KMAEEQSRKIHRKAINYGVNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSL 
1240 1250 1260 1270 1280 1290 
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1380 1390 1400 1410 1420 1430 

orflna-1 pep AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 
g * P P MIM ( | ( ::lt:| |:::||: II: M :M I II: : I . 

S P45387 AFNRYNAGIRVDYT FT PTDN I S VKPYFFVN YVDVSNANVQTTVNLTVLQQPFGRYWQKE V 

F 1300 1310 1320 1330 1340 1350 

1440 1450 1460 1469 

orf lng-1 . pep GVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 
10 l::IM t :l : : : I I I I : : : I : I M I I I 

p45387 GLKAEILHFQISAFISKSQGSQLGKQQNVGVKLGYRW 
1360 1370 1380 1390 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

15 Example 78 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 65 5>: 

1 . .AAGGTGTGGC AATTTGTCGA AGA.CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

20 151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 

2 51 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

25 4 01 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 

1 ..KVWQFVEXPL RAVVPADSFE PTAQKLNLFK AGAATILFYE DQNVVKGLQE 
51 QFPAYAANFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 
101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

30 Further sequence analysis revealed a further partial DNA sequence <SEQ ID 657>: 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

35 201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; ORF6-l>: 

40 1 . . LRAWPADSF EPTAQKLNLF KAGAATILFY EDQNVVKGLQ EQFPAYAANF 

51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 
101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from K meningitidis (strain A) 
45 ORF6 shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) from strain A of N. 
meningitidis: 

10 20 30 

orf 6 . pep KVWQFVEXPLRAWPADSFE PTAQKLNLFK 

I I I I I I I I I I I I I I I I I I I I I I I I II I I 
50 orf 6a QIVEHAVLHTPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFE PTAQKLNLFK 
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40 50 60 70 80 90 

40 50 60 70 80 90 

orf6 pep AGAAT I L F YE DQN WKG LQEQ F P A Y AAN F PVW ADQAN AMVQ Y AVWT TLAAVG VGAN LQH Y 

5 lilMHUIiiMIIIIMiliniilNMIIilNIHtMIIIIMItltllllfl 
orf6a AGAAT I LFYE DQN WKGLQEQFPAYAAN FPVWADQANAMVQ YAVWTT LAAVGVG ANLQHY 

100 110 120 130 140 150 

100 110 120 130 140 

10 orf 6 . pep N PL P D AAI AKAWN I PEN W LLRAQMV I GG I E GAAGEKT FE P VAE RLKV FG AX 

I I I I t I I I t t I I M I I I I I I I I I I M I I I 1 I I I I I I 1 ! M I I I 1 > I I I M I 
orf 6a NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 

The complete length ORF6a nucleotide sequence <SEQ ID 659> is: 

15 l ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

20 251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

301 GAAGATCAAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 

351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

401 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 

451 TACAATCCCT TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

25 501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC 

601 GCATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 
30 51 RWVLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQ YAVWTT LA AVGVGANLQH 
151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 
201 A* 

35 ORF6a and ORF6-1 show 100.0% identity in 131 aa overlap: 

50 60 70 80 90 100 

orf 6a . pep T P S S FN S QS ARV WL FGE E H DKVWQFVE D ALRA W P AD S FE PT AQKLN L FKAG AAT I L F Y 

I II I I I I I I I I I t I I I I I t I I I I I I I I I M 
orf 6-1 LRAWPADS FEPTAQKLNLFKAGAAT ILFY 

40 10 20 30 

110 120 130 140 150 160 

orf 6a . pep E DQN WKGLQEQFPAYAAN FPVWADQANAMVQYAVWTT LAAVGVGAN LQHYN PLP DAAI A 
I i I t I I I i I I I 1 I I I I I t I I I I 1 I I 1 ! I I M I I i I I I I I i I I I 1 I I I I I I I i II t I I I i i 
45 orf 6-1 EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 

40 50 60 70 80 90 

170 180 190 200 

orf 6a . pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
50 I I M I I I I I I I I 1 I II I 1 M I I ) I I I I I I I I I 1 I I I I I I I I I 

orf 6-1 KAWN I PEN W L LRAQMV I GG I EG AAGE KT FE P VAE RLKV FGAX 

100 110 120 130 

Homology with a predicted ORF from Kgonorrhoeae 
55 ORF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 
N. gonorrhoeae: 



60 



orf 6. pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 30 

I I I I I M I I I I I I I I II I I I I I t i : I I I 
orf 6ng SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFK 64 
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orf6 D6D AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 90 

|MIMI!IHi||||||llillllMIIIIII!lttlllllltllltl!ll:tMHM 
o r f 6ng AGAAT I LFYEDQN WKGLQEQFPAYAAN FPVWADQANAMVQYAVWTTLAAVGAGANLQHY 124 

f6 N P L P D AAI AKAWN I P EN W L LRAQMV I GG I E G AAG E KT FE P V AERLKV FG A 140 

|}'|||:|||| III II llll Illtttl II I II 11111:1111 III III IN 
orf6ng NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 174 



The complete length ORF6ng nucleotide sequence <SEQ ID 661> was identified as: 

\0 i ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

15 251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

401 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

20 501 acgtttgAAA GTGTTCGGCG CATAA 

This encodes a protein having amino acid sequence <SEQ ID 662>: 



1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV EDALRAWPA 

51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 

101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 

25 151 GGIEGAAGEK VFEPVAERLK VFGA* 



ORF6ng and ORF6-1 show 96.9% identity in 131 aa overlap: 

10 20 30 

LRA W PAD S FE P T AQKLN L FKAG AAT I L F Y 
I I I I I I I I I II I I I I I I : I II I I I I I I I I I 
PTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 
20 30 40 50 60 70 



30 



orf 6-1 .pep 

orf 6ng 



40 50 60 70 80 90 

35 orf 6-1 . pep E D QN VVKG LQE Q F P AYAAN F P VW ADQAN AMVQ Y AVWT T LAAVG VG AN L QH YN PL P D AA I A 

II M I I M I i I II I II I I I I I II I I I I I I I M I I i I I I I I I i I : I I I I I I I i I I I I : I I I 
orf 6ng EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
80 90 100 110 120 130 



40 100 110 120 130 

orf 6-1 . pep KAWNIPENWLLRAQMVIGGIEGAAGEKT FEPVAERLKVFGAX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I i I I I I 
0rf6ng KAWN I PEN WL LRAQMV I GG I EGAAGEKV FEPVAERLKVFGAX 

140 150 160 170 

45 

It is predicted that the proteins from K meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 79 



The following partial DNA sequence was identified inN.meningitidis <SEQ ID 663> 

50 1 . . GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGG£ CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

201 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 

55 251 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 

301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 
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GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 
ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 
AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 
ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 
AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCAA 
CCAAGACTGG AAACTCAAAG CCGAATACGA CTAC. . 

This corresponds to the amino acid sequence <SEQ ID 664; ORF23>: 

1 ..GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EVVRGVAGLL 

51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 

101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 

151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 

201 QDWKLKAEYD Y. . 

Further work revealed the complete nucleotide sequence <SEQ ID 665>: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCTGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGACGCG 

601 GACGTATCGG GCAGCCTGAA CACCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCGGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTGAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTGAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATTTTGGGCG GACGATACAC CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

14 51 GCATCGTGTT CGACCTGACC GGCAACCTGT CTCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 



351 
401 
451 
501 
551 
601 
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251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYTRYR 
SLFVPQSQKD 
LATAAGRDPS 
DQDGSRLNPD 
TLRIPNPAAK 
YRTQPDRHSY 



KETADAPLSY 
NQDWKLKAEY 
SASVSLIGKY 
RTGAYPQPAS 
TGSYDSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAADNSRQK 
GALRTVNAAF 



AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYGT 
MTYVSANRFT 
GNNLEAGIKG 
AKTHGWEIEV 
TAYHFAPEAP 
AYAVADIMAR 
TYRFK* 



AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGWTIGAGVR 
YRFNPRAELS 



NWANSRHRAL 
HNTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKTR 
WQSETHTDPA 
LNVDNLFNKH 



15 



20 



25 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the ferric-pseudobactin receptor PupB of Pseudomonas putida (accession number P38047) 
ORF23 and PupB protein show 32% aa identity in 205aa overlap: 

FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 
++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 



Orf23 


6 


PupB 


215 


Orf23 


66 


PupB 


274 


Orf23 


126 


PupB 


334 


Orf23 


184 


PupB 


392 



R T + 



EAGN 



+G 



DVSG L 



+YGI E+D++ T + 



D+PL 



+RGR V+ + 



S G 



N A +W+ 



+ H 



+ F IE + 



W K E 



30 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF23 shows 95.7% identity over a 21 laa overlap with an ORF (ORF23a) from strain A ofN. 
meningitidis: 



35 



40 



45 



50 



55 



10 20 30 

orf 23 .pep GYNYLFARGSRIANYQINGIPVADALADTG 

I I II I I I i I I I i M i I 11 I 1 I I i I f i f I I I 
orf 23a QMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIPVADALADTG 
90 100 110 120 130 140 

40 50 60 70 80 90 

or f 23 . pep N ANT AAYE RVE VVRG VAG L L DG TGE PS ATVN L VRKRLT RK PL FE VRAE AGNRKH FG L DAD 
I I I I I I I 1 I i I i I I I II I I I ! I t ( I I I 1 I I M I I I I I I I 1 I I I I I I I I I I I I I I II II 
orf 23a NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 
150 160 170 180 190 200 

100 110 120 130 140 150 

orf 23 . pep VSGSLNTEXXLRGRLVSTFGRGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 
111111:1 : I I II I I I II I I I I I I I : M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 23a VSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGILEYDIAPQTRVHAGMDYQQAK 
210 220 230 240 250 260 

160 170 180 190 200 210 

orf 23 . pep ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 
I I I I I I I I I I II i I I I I I I I I I 11 i II I I f I II I : I I I i I ! I I I I f I I I I I I I I I i I I I I 
orf 23a ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 
270 280 290 300 310 320 



60 



orf 23. pep Y 
I 

orf 23a YTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTHSASVSLIGKYRLFGREHDLIA 
330 340 350 360 370 380 
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The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCAAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAGCG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA TGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTAAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG GCAGATACAG CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

14 01 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 668>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEVVRGVAGL LDGTGEPSAT VNLVRKRPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

4 51 ILGGRYSRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap: 

10 20 30 40 50 60 

orf23a.pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
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15 



20 



1 I I I I 1 I M I I 1 1 I I M 1 ! i I i I I I I I t ! I I i 1 i t I I I I t t II i 1 I I I I I 1 I I I I I I 1 I I 
orf 23-1 MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf23a pep PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARG 
||IMIIIIIIIIMIIMIIMIIIIIl:lllllM!tl)lllliiiMlilliMlii 
orf 23-1 PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 2 3a pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 
IIMIMMIilillllMllilllllilMIIIIMIIIIIMIIIIIIIIIIIII II 
or f 2 3 - 1 SRI AN YQINGI PVADALADTGNANTAAYERVE WRGVAGLLDGTGE PS AT VNLVRKRLTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 3a. pep KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 
I I I I I I I II I I I II I I I I I I I I I I I I : I I I 1 I I I I I I I I I I I I I I I : I I I I 1 I II I I I I 
orf 23-1 KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 



25 



250 260 270 280 290 300 

orf 2 3a. pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDN PATNWANSRHRAL 

I I j I 1 I I I I I I I I I I I II I I I 1 I 1 I M I I I I i I I I I I I I I 1 I I I I I I I I I M I I I I I I I I 
orf 23-1 LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDN PATNWANSRHRAL 

250 260 270 280 290 300 



30 



35 



40 



45 



310 320 330 340 350 360 

orf 23a . pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
I I I I I I I I I I I I I I I I I M I I I I I I 1 I I I I II I I 1 I I I I I II I I I 1 I I I 1 I M I I I I I I I 
orf 23-1 NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 23a. pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
I I I I I I I I I I I I M I I I I ! I I I I I I I I I I I I II I I I I 1 I II I I I I I II I I I I I 1 I I I I I I 
orf 2 3-1 SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf 23a . pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 
I I I II I I I I II I I I II I I I I I I I I M I I I I I I I I I I : I I I I I I I I I I I I II I I I I I I I I I 
orf 23-1 FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
430 440 450 460 470 480 



50 



490 500 510 520 530 540 

orf 2 3a. pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 23-1 PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 



55 



550 560 570 580 590 600 

orf 23a. pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I I I II I I I I I I ! I I I I I I I I I I I I I I ! I II I I I I I I 1 I I I M I I I I I I I I I I I I I M I I 
orf 23-1 AAVYRARECNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

550 560 570 580 590 600 



60 



65 



70 



610 620 630 640 650 660 

orf 23a. pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I I I I I I I I I I I I I I I I II I I I I I f I I I I I I I II 1 I I I I I I I I I I I II I I I I I I I I I I I i 
orf 23-1 DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPN PAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

Orf 23a . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
I M I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! 
orf 23-1 ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
670 680 690 700 710 720 
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orf 23a. pep TYRFKX 
I I I I I I 

orf23-l TYRFKX 

5 

Homology with a predicted ORF from N. gonorrhoeae 

ORF23 shows 93.4% identity over a 211aa overlap with a predicted ORF (ORF23.ng) from N. 
gonorrhoeae: 

orf23 .pep GYNYLFARGSRIANYQINGIPVADAIADTGNANTAAYERVEWRGVAGLLD 51 

10 ! I i ! I I I I i i I I ( I I I I I I f I I I t I I f i I I I I I I I I t I I I I I t I I I I I I I 

orf23ng SAVDACRIPGYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPD 60 

orf23.pep GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 111 
j f I t I I I I I I f I i i : i I i i 1 ! I I I I I I t I I I t t t I I I I i I I I I : I : I I ( ! I i I I II 1 
15 orf23ng GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 

orf 23 . pep GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 171 

Hill: I i I I I 1 I i I i I i t t i i I I I t i I I I ! I i It I I I I I II I I I I 1 I f f I ! I I I I i 
orf23ng GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 180 

20 

orf 23 .pep GPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 211 

I I I I I I I I I I : I I : : I I I I I i I i i I I I I I I I I I I I I i I I I 
or f 2 3ng GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSI DHS 24 0 

The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
25 amino acid sequence <SEQ ID 670>: 

1 SAVDACRIPG YNYLFARGSR IANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDNPATNW SNSRNRALNL 

30 2 01 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPNA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

351 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

401 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLNASAA VYRARKNNLA 

35 451 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVADIMARYR FNPRTELSLN VDNLFNKHYR 

601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further work revealed the complete nucleotide sequence <SEQ ID 67 1>: 

40 1 ATGACACGCT TCAAATACTC CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CCGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CCGTTTCCGG CACGCACACC CCGTTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

45 251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 

50 501 TTCTGCCACC GTCAATCTGG TACGCAAACA CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCC GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

55 7 51 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CAGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CAAAAGACAA CCCCGCCACA AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATAGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 

60 1001 CAGGCGTACT TTCCATCGAC CACAGCACTG CCGCCACCGA CCTGATTCCC 
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1051 GGTTATTGGC ACGCcgatCC GCGCACCCAC AGCGCCAGCA TGTCATTGAC 

1101 CGGCAAATAC CgcctGTTCG GCCGCGAGCA CGATTTAATC GCGGGTATCA 

1151 ACGGCTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATTCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGCG CCTATCCGCA 

1251 GCCATCATCG TTTGCCCAAA CCATCCCGCA ATACGACACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG GC AG AT AC AG CCGCTACCGC GCAGGCAGCT ACAACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGATCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATT GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGA CATCAAAGGC GAATGGCTTG 

1601 AAGGGCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCAGAGC GGCAACACCT ACTATCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGCT ACAGCCAAAG CAAACCCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTAcCCG AACGCAGCTT 

1851 CAAACTCTTC ACCGCCTACC ACTTAGCCCC CGAAGCCCCC AGCGGCCGGA 

1901 CCATcggTGC GGGTGTGCGC CGGCAGGGCG AAACCCACAC CGACCCAGCC 

1951 GCGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG TCGCCAACAG 

2001 CCGCCAGAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCAC CGAACTGTCG CTGAACGTGG ACAACCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PFGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL PDGTGEPSAT VNLVRKHPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQLE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWSNSRNRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HSTAATDLIP 

351 GYWHADPRTH SASMSLTGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPSS FAQTIPQYDT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR AGSYNSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQLQKD EHGSYLKPVT GNNLEADIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDQS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKPR 

601 DQDGSRLNPD SVPERSFKLF TAYHLAPEAP SGRTIGAGVR RQGETHTDPA 

651 ALRIPNPAAK ARAVANSRQK AYAVADIMAR YRFNPRTELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 

10 20 30 40 50 60 

MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
I | I ! M I I I t I 1 II 1 i i I I I [ I II I i I I i ! I I I I M I ! i t I t I t I I ! I i I I I i I I I 1 I I I 
MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
10 20 30 40 50 60 

70 80 90 100 110 120 

PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
I : I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I ) I 1 I I II 1 II I M I I I I 
PFGLPMTLRE I PQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
70 80 90 100 110 120 

130 140 150 160 170 180 

SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 
I I II I I I I I I II I I I I I I M I II I I I I t I I I I I M I I I II I I I I I I I I I I I I I I I : II 
SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPDGTGEPSATVNLVRKHPTR 
130 140 150 160 170 180 

190 200 210 220 230 240 

KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 
I I I I I I I I I I I I I II I I I I I I I I i I I : I I I I I I ! I I I I I I I I I II I : I I II I I I I I I I 
KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGI 
190 200 210 220 230 240 

250 260 270 280 290 300 

LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 



orf 23-1 .pep 
orf23ng-l 

orf23-l.pep 
orf23ng-l 

orf 23-1. pep 
orf 23ng-l 

orf23-l .pep 
orf23ng-l 

orf23-l .pep 
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40 



45 



|||lMllMMIIl!lltlilliiiiiliiiiiiiiillliil!illliM:l(t:t!l 
orf23na-l LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 23-1 pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
j | | | t i t I I I I I I I I I I I I M I I M I I I I i I I I I I I I M I I: I I M i I II I I I I I I I I I I 
orf23ng-l NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHSTAATDLIPGYWHADPRTH 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf23-l pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
Mini I II I I I I I II i M i 1 I I M I I I I I I I I M M I I II t I 1 I II I I M 1 I I I I I : I 
orf23ng-l SASMSLTGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPSS 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 23-1 . pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
M I I I I I I I 1 I I I 11 I I I I I I I I I I I M I I II I I I : I I I : I I I : I I I I I I I II I I I I I I 
orf23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 2 3-1. pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
i I I i t I I I I M I I I I I M 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
orf23ng-l PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKFVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 23-1 . pep AAV YRARKNNLATAAGR DPS GNT Y YRAAN Q AKT HGWE I E VGGR I T PE WQ I Q AG Y S Q S KTR 
I II I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
orf23ng-l AAVYRARKNNLAT AAGRDQS GNT Y YRAANQAKTHGWE I EVGGRI T PE WQI QAG Y S QSKPR 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 23-1 . pep DQDGSRLNPDSVPERS FKLFTAYHFAPEAPSGWT I GAGVRWQSETHTDPATLRI PNPAAK 
i I I II I I ! I I I I I I I II I I I I M I : I I I I I I I I I I I II I I : I 1 I I I I I : I I I I I M I I 
orf23ng-l DQDGSRLNPDSVPERS FKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRI PNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 23-1. pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
IN: I I I I I II I I I II I I if M I I I : I I I II 1 I I I I I I I M I II I I I I I I I ! I I I M I I 
orf23ng-l ARAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



orf 23-1. pep TYRFKX 
I I I I I I 

50 orf23ng-l TYRFKX 

In addition, ORF23ng-l shows significant homology with an OMP from E.coli: 



55 



60 



65 



sp|Pl68 69|FHUE_ECOLI OUTER- MEMBRANE RECEPTOR FOR FE ( III ) -COPROGEN, FE(III)- 
FERRIOXAMINE B AND FE { III ) -RHODOTRULIC ACID PRECURSOR >gi | 1651542 j gnl | PID | dl015403 
(D90745) Outer membrane protein FhuE precursor [Escherichia coli] 
>gi 1165154 5 1 gnl I PID | dl 0154 05 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] >gi 11787344 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III) -ferrioxamine B and Fe (III) -rhodotrulic acid precursor 
[Escherichia coli] Length - 729 
Score = 332 bits (843), Expect = 3e-90 

Identities = 228/717 (31%), Positives - 350/717 (48%), Gaps = 60/717 (8%) 

Query: 38 TITVTADRTASSN--DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 

T+ V TA + + Y+V-l- T + MT R+IPQSV++4- + Q+M DQ -f+TL + 

Sbjct: 4 3 TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 

Query: 96 LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP VADALADTGNANTAA 147 

G S+ SDRA Y ++RG +1 NY -f+GIP + DAL+D A 
Sbjct: 103 ENTLGISKSQADSDRALY Y S RG FQ I DN YM V DGIPTYFES RWN LG DAL S DM AL 154 
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Query: 148 YERVEWRGVAGLPDGTGEPSATVNLVRKHPTRKPLF-EVRAEAGNRKHFGLGADVSGSL 206 

+ERVEWRG GL GTG PSA +N+VRKH T + +V AE G+ AD+ L 

Sbjct: 155 FERVEWRGATGLMTGTGNPSAAINMVRKHATSREFKGDVSAEYGSWNKERYVADLQSPL 214 

Query: 207 NAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADA 266 

+G +R R+V + DSW S GI++ D+ T + AG +YQ+ + 

Sbjct: 215 TEDGKIRARIVGGYQNNDSWLDRYNSEKTFFSGIVDADLGDLTTLSAGYEYQRIDVNSPT 274 

10 Query: 267 PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 326 

+++ G + ++ + A +W+ + +F ++ +F W+ ++ 

Sbjct: 275 WGGLPRWNTDGSSNSYDRARSTAPDWAYNDKEINKVFMTLKQQFADTWQATLNATHSEVE 334 

Query: 327 F- -RQPYGVAGVLS I DHSTAA — T DLI PGY WHADPRTHSA- SMSLTGKYRLFG 374 

15 F + YAVD ++ PG+ W++ R A + G Y LFG 

Sbjct: 335 FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 394 

Query: 375 REHDLI AGINGYKYASNKYGER — SI I PNAI PNAYEFSRTGAYPQPS S FAQT I PQ YDTRR 432 
R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT 

20 Sbjct: 395 RQHNLMFG-GSYSKQNNRYFSSWANIFPDEIGSFYNFN— GNFPQTDWSPQSLAQDDTTH 451 

Query: 433 QIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTY-VSANRFTPYTGIVFDXXX 491 

Y ATR AD L LILG RY4- +R + +TY + N TPY G+VFD 

Sbjct: 452 MKSLYAATRVTLADPLHLILGARYTNWRVDT LTYSMEKNHTTPYAGLVFDIND 504 



25 



Query: 4 92 XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNASAAVYRARKNNL 551 

F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 
Sbjct: 505 NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAIFRIEQDNV 564 



30 Query: 552 ATAAGR DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G-J- +N 

Sbjct: 565 AQSTGTPIPGSNGETAYKAVDGTVSKGVEFELNGAITDNWQLTFGATRYIAEDNEGNAVN 624 

Query: 609 PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 668 
35 P ++P + K+FT+Y LP P T+G GV Q +TD P RA 

Sbjct: 625 P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGVNWQNRVYTDTV TPYGTFRA E 672 

Query: 669 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 
Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 

40 Sbjct: 673 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 729 

Based on this analysis, it was predicted that these proteins from N. meningitidis and K gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in Exoli, as described 
45 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His-fusion protein, and Figure 15B shows the 
results of expression of the GST- fusion in E.colL Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 

50 Example 80 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 
51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 
101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 
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151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 
201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 
251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 
301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 
5 351 TnAGTCGCCG ACGGGG. . 

This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI XSRMRATXSP TG, . 

1 0 Further work revealed the complete nucleotide sequence <SEQ ID 675>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

15 201 AACGGGGATA AACGCGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGTGCCAC 

351 TGAGTCGCCG ACGGCGGGGG TCGGCGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

20 4 51 CGGGTAATTT TGAAAGCAGT TTTCTTCACT ACTTCCGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCTGAATTTT CCAACGCGGC TTTTACGACA CCTGGGCCGG 

551 ATACGCCGAC ATTGATAACG GCATCCGCTT CGCCCGAACC ATGAAACGCG 

601 CCCGCCATAA ACGGGTTGTC TTCCACCGCG TTGCAGAACA CGACAATTTT 

651 AGCGCAGCCG AAACCTTCGG GCGTGATTTC CGCCGTGCGT TTGACGGTTT 

25 701 CGCCCGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC AATATCGGTA GTCTTCATCG CTTCGGGAAT 

801 GGAGCGGATT AACACCTCAT CCGAAGGCGA CATCCCTTTT TGCACCAACG 

851 CGGAAAAACC GCCGATAAAA GACACACCGA TGGCTTTGGC AGCTTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

30 This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 

1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIISKPTE QTAVMASSLS 

51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

35 201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PARVLP 

251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 

301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
40 ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf24a.pep MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
I I I I I I I I ! I I I II I I I I I I I I II I I I I I I I I I I 1 I I I I I I I I : I I I I I : I I I 1 I I II I 
45 orf 24 MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAI I SKPTEQTAVMASSLS SVSTPASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 24a. pep 1 1 PS SSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPI SSRMRATESP 
50 I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I i I I I I I I I I I II 

orf 2 4 1 1 PSSSETGINAPLKPPTALEAIMPPFFTAS FSNAKAAVVPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

55 orf 24a. pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

! I I 1 I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II M 1 I II I I I I I I M I M I I II I 
orf 24 TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
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130 140 150 160 170 180 

190 200 210 220 230 240 

PG PDT PTLITAS AS PEPXNAPAIXGLS SXALQNTTI LAQPKPSS VI SXVRLMVS PAS LTA 
I | | | || | ( | | | | i I I i I i f I t t I I t I I : I i t I I I I I ! M I I I : 1 I I III I I Ml I I I 
PG PDT PTLITAS AS PEPXNAPAINGLS STALQNTT I LAQPKPSGV I SAVRLT VS PAS LTA 
190 200 210 220 230 240 



orf24a.pep 
orf24 



250 260 270 280 290 300 

1 0 orf 24a oep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 

1U orr24a.pep ,,,,,,,,,,,,,,,,,,,,,,,,,, , , , ,, , , , , ,, , - , | | | | | | || M I I I I I I 

orf 24 S ILI PARVLPILMELHTI SWFIASGMERINTSSEGDI PFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 

15 

orf 24a. pep KVCATLTX 
I I I I t I I ! 
orf 2 4 KVCATLTX 

The complete length ORF24a nucleotide sequence <SEQ ID 677> is: 

20 1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG TGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA NCCGACCGAA CAAACGGCGG TCATCGCTTC GAGTTTATCC 

151 AACGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 NACGGGGATA AACGCGCCAC TCAAACCGCC AACCGCGCTC GAAGCCATCA 

25 251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAACCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCAGGGG TCGGTGCCAG CGACAAGTCG AGAATACCAA 

4 01 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAGGCGGT TTTCTTCACA ACTTCGGCAA CTTCGGTCAA 

30 501 TGTCGTTGCA TCCGAATTTT CCAACGCGGC TTTTACGACA CCCGGGCCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCTGAGCC GTGAAACGCG 

601 CCCGCCATAN ACGGGTTGTC TTCCNCCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCTA GTGTGATTTC ANCCGTGCGT TTGATGGTTT 

701 CGCCCGCCAG TCTGACCGCG TCCATATTGA TACCGGCGCG CGTACTGCCG 

35 751 ATATTGATGG AGCTGCACAC GATATCAGTA GTCTTCATCG CTTCGGGAAT 

801 GGAACGGATN AACACCTCGT CAGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCAATAAAA GACACGCCGA TGGCTTTGGC AGCCTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This encodes a protein having amino acid sequence <SEQ ED 678>: 

40 1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 
151 RVILKAVFFT TSATSVNVVA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 
201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMV SPAS LTA SILIPARVLP 

45 251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 
ORF24a and ORF24-1 show 96,4% identity in 307 aa overlap: 

10 20 30 40 50 60 

50 orf 24a. pep MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

I I t I I I I II II i f I I t II ! t II I I I I I I I I M I I I I I I I I I I I : I I I I I : I I I I I I I I I 
orf 24-1 MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 
10 20 30 40 50 60 

55 70 80 90 100 110 120 

orf 24a. pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
Mill! I I I ! I II I I i II II I i I I I II I II I I I I 1 I I I I ! M I I I i I I I I I I I I I M I I 
orf 24-1 IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
70 80 90 100 110 120 



60 



130 140 150 160 170 180 



* 
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10 



15 



orf 24a oep tagvgasdksripngifsifeasrpmssptrvilkavffttsatsvnwasefsnaaftt 

. p ttitiiiiMiiinniiMiiiiiiiiiiiiimmifiiiMiiiitnimii 

orf24-l TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf24a pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 

1 1 i 1 1 1 1 1 i 1 1 1 1 1 it f 1 1 1 1 1 1 itii:ininiiiimi:Mi Ml lltlllll 

orf24-l PGPDT PTLITASAS PEPXNAPAINGLS STALQNTTI LAQPKPSGVI S AVRLTVS PAS LTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf24a pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
I I I I I II I I > I I I I M 11 1 I I I I I I I I 1 i II i I II I I i I 1 I M I I I I I I I ! I I I I I I I 
orf 24-1 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 



20 



orf 24a. pep 
orf24-l 



KVCATLTX 
i I I I i It I 
KVCATLTX 



2 25 



30 



U 35 



Homology with a predicted ORF from AT. gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
N. gonorrhoeae: 

60 



orf 2 4 .pep 
orf 24ng 
orf 24 .pep 
orf 24rtg 
orf 24 .pep 
orf 24ng 



MRTAVVLLLIMPMAAS S AMMPEMVCAGVS PGTAI I SKPTEQTAVMASS LS SVST PAS AAA 

II I M I I I I I I I I I M I I i I i I I I I I I I I i M I I : I I I I I I f I I i I 1 f I I I i : I I M I 1 I 
MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 



60 



IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPIXSRMRATXSP 120 

I I I I 1 I 1 I I I I M I II I I I I I I I ! I I I I I I I ! t i I! I ! 1 I II I t t I t I II M I i I I II 
IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPISSRMRATESP 120 



TG 



122 



TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 180 



The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



ATGCGCACGG 
GGCGATGATG 
TCATGTCCAA 
AGCGTCAACA 
AACGGGGATA 
TGCCGCCCTT 
CCGTGCGTAC 
CGAGTCGCCG 
ACGGGATATT 
CGGGTGATTT 
GCTGACCGCG 
ATACGCCGAC 
CCCGCCATAA 
GGCGCAGCCG 
CGCCTGCCAG 
ATATTGATGG 
GGAACGGATC 
CGGAAAAGCC 
AAAGTCTGCG 



CGGTGGTTTT 
CCGGAAATGG 
ACCAACGGAG 
CGCCTGCCTC 
AACGCGCCGC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCGGGGG 
CAGCATTTTT 
TGAAAGCGGT 
TCCGAATTTT 
ATTAATCACA 
ACGGATTGTC 
AAACCTTCGG 
CTTGACCGCA 
AGCTGCACAC 
AACACCTCAT 
GCCGATAAAG 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAGACGGCGG 
GGCGGCGGCA 
TCAAACCGCC 
TCGTTCAGCA 
CAAGCCCATT 
TCGGTGCCAG 
GAGGCTTCGC 
TTTCTTCACG 
CCAGCGCGGC 
GCATCCGCTT 
TTCCACCGCG 
GTGTGATTTC 
TCCATATTGA 
GATATCGGTA 
CCGAAGGCGA 
GACACGCCGA 
ATAA 



ATGCCGATGG 
CGTGTCGCCG 
TCATGGCTTC 
ATCATACCTT 
GACCGCGCTG 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAATCG 
GACCGATGAG 
ACTTCGGCGA 
TTTGACCACG 
CGCCCGAGCC 
TTGCAGAACA 
AGCCGTGCGT 
TACCGGCACG 
GTTTTCATCG 
CATACCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTGTCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGCGCCAC 
AGAATGCCGA 
TTCGCCCACG 
CCTCGGTCAG 
CCTGGACCGG 
GTGGAACGCA 
CGACGATTTT 
TTGATGGTTT 
CGTGCTGCCG 
CTTCGGGAAC 
TGCACCAGCG 
TGCCTTGTCC 



60 



This encodes a protein having amino acid sequence <SEQ ID 680>: 

1 MRTAVVLLLI MPMAASSA MM PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAVV 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDT PTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVISAVR LMVSPASLTA SILIPARVLP 
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251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 
301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 

10 20 30 40 50 60 

5 orf24-l pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

| | ! 1 1 | 1 I 1 I I i I i I I I I I I i I I f I I i if I If I I : I I I i I i i t I I M f i M I : II I f ! I i 
orf24ng MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 
10 20 30 40 50 60 

10 70 80 90 100 110 120 

orf2 4-l pep 1 1 PS S SETGINAPLKPPTALE AIMPPFFTAS FSNAKAAWPCVPQTLKP I S SRMRATES P 
M I t t I ! I I I I I I I I 1 I I i i I I I I I 1 I I I I I II M I M I II 1 II I I I I I I I I I I i I I II I 
orf24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
70 80 90 100 110 120 

15 

130 140 150 160 170 180 

orf 24-1 . pep TAGVGASDKSRI PNG I FS I FE ASRPMS S PTRVI LKAVFFTT SAT S VN WASE FSNAAFTT 
I II I I I I II I I : I t I I I I I t I I il I 11 I I I I I I 1 i i I I I i i I I t I I ::IMII:I|:II 
orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 
20 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24-1. pep PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
I I I I I I I I I I I I 1 I I I I I I I I i I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
25 orf24ng PGPDTPTLITASASPEPWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24-1. pep SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
30 I I 1 I I I I I I I I I I I I I II I I I I I I I I I I I I II I II I I I I I I : I I I I I I I I I I I I I 11 I ! 

orf24ng SILIPARVLPILMELHTISVVFIASGTERINTSSEGDIPFCTSAEKPPIKDTPMALAALS 

250 260 270 280 290 300 

35 orf 24-1. pep KVCATLTX 

I I I I I ! I I 

orf24ng KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 18 aa - double- 
40 underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from K meningitidis and TV, gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 81 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 68 1>: 

45 1 . . ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

50 1 . .TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 

51 IQYLRGYSID * 

Further work revealed the complete nucleotide sequence <SEQ ID 683>: 



1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 
51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 



• 
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10 



15 



101 


TGCAAGGCAT 


151 


TCTTTCGCGC 


201 


CGCCGCCGCC 


251 


AGGAAGGCGG 


301 


TCTGAAACGC 


351 


AACTGCTTTG 


401 


TTAAAGACGG 


451 


GGTCAGACGG 


501 


GTCTGCCGCG 


551 


GCAAGGCGGT 


601 


CGTGAAGAAG 


651 


TGCCGCCGGC 


701 


AACCGGAAAT 


751 


GTATCACGGG 


801 


ATCCGAAATT 


851 


AGTTGGTCGG 


901 


CGACAAGCCG 


951 


GCTGCAATGC 


1001 


GCTATTCCAT 



ACGCGGCAAT 
GCGAAGACGG 
TACGGTTTGG 
GCGCACGTTC 
TTGCCGATGC 
TCGGATATTG 
CGTATTGACG 
CATTTGTCGA 
CTGCTGCCTT 
GAAAAAAGAA 
AACCGTCCAA 
GGCGATGCGG 
CCTGCATCCT 
GCGAAGTGGA 
ACCAAACTTT 
CGAACAACGC 
CCGCGCAGGC 
GACACGCGGA 
C GAT TAG 



ATTCAGGAAA 
CAGGCAGTTT 
CGTTTTCTTT 
TGTATCGCCG 
CAAGGCAAAC 
TGCGGCAGAA 
GCAGCCGTCC 
CAACACGGTC 
ACGGCGTGAA 
GACGCGGTCA 
ACCCACGCCC 
GCGTACCCCA 
GACGACGGCG 
AGAGGCGCGC 
GGGGAGGACT 
AAGTGGGCGC 
AGACCGGCAG 
TGACGCGCGA 



CGCTCACGCA 
GTCGATGCCG 
GGAACACGCT 
ATTTGAACAT 
AGCCCCCTGT 
GACGGGCGGC 
GCTTCCTGCC 
GGTATGGCGG 
GAGCATCGTG 
GGATTTTGAG 
GAAGACATTT 
AGCCGCAGAA 
AGCGTGCCGA 
GTACAAAACC 
CGATACCGAC 
AGGAAAAAAT 
GAATACGCCG 
ACGGATACAG 



GGAAGCGCGT 
AC AAAAT T AT 
TCGGAAACGC 
TACCGTGCCG 
TGTACGGGGA 
AATGTCGAGT 
CGTCAAAGAC 
CGCAAACGCT 
ATGATAGACG 
CGGAAAAGCC 
TGGAACACAA 
GGCGCGCCCG 
TACCGTTACC 
AGCGTGCGGA 
GTGCAAAAAG 
CAGCAACTGC 
AATACCTCAA 
TATCTTCGCG 



20 This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 



1 MYRKLIALPF ALLLAACGRE 



25 



51 SFARE DGRQF 

101 SETLADAKAN 

151 GQTAFVDNTV 

201 REEEPSKPTP 

251 VSRGEVEEAR 

301 RQAAAQADRQ 



VDADKIIAAA 
SPLLYGETAL 
GMAAQTLSAA 
EDILEHNAAG 
VQNQRAESEI 
EYAEYLKLQC 



EPPKALECAN 
YGLAFSLEHA 
SDIVRQKTGG 
LLPYGVKSIV 
GDAGVPQAAE 
TKLWGGLDTD 
DTRMTRERIQ 



PAVLQGIRGN 
SETQEGGRTF 
NVEFKDGVLT 
MIDGKAVKKE 
GAPE PE I LHP 
VQKELVGEQR 
YLRGYSID* 



IQETLTQEAR 
CIADLNITVP 
AAVRFLPVKD 
DAVRILSGKA 
DDGERADTVT 
KWAQEKISNC 



30 



35 



40 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from strain A of N. 
meningitidis: 

10 20 30 

orf25 pep TDVQKELVGEQRKWAQEKISNCRQAAAQAD 

MINIMI! ! ! I I I I I I 1 I I I I ' I 1 I I 
or f 25a VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNCRQAAAQAD 
250 260 270 280 290 300 

40 50 60 

orf 25 . pep RQE YAE YLKLQCDTRMTRERI QYLRGYS I DX 
M I I I II I I M I I I I I I I I I I I I I I I I I I I I 
orf 25a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 



The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGTATCGGA 
CGGCAGGGAA 
TGCAANGCAT 
TCTTTCGCGC 
CGCCGCCGCC 
AGGAAGGCGG 
TCTGAAACGC 
AACCGCTTTG 
TTAAAGACGG 
GGTCAGANGG 
GTCTGCCGCG 
GCAAGGCGGT 
CGTGAANAAG 
TGCCGCCGGA 
AACCGGAAAT 
GTATCACGGG 
ATCCGAAATT 



AACTCATTGC 
GAACCGCCCA 
ACGCNGCAAT 
GCGAAGACNG 
TANGNTNNGN 
GCGCACGTTC 
TTGCCGATGC 
TCGGATATTG 
CGTATTGACG 
CATTTGTCGA 
TTGCTGCCTT 
AAAAAAAGAA 
AACCGTCCAA 
GGGGATGCAG 
CCTGCATCCT 
GCGAAGTGGA 
ACCAAACTTT 



GCTGCCGTTT 
AGGCATTGGA 
ATTCAGGAAA 
CANGCAGTTT 
NGNTNTCTTT 
TGTNTCGCCG 
CAAGGCAAAC 
TGCGGCAGAA 
GCAGCCGTCC 
CAACACGGTC 
ACGGCGTGAA 
GACGCGGTCA 
ANCCNNGCCC 
ACGTACCCCA 
GACGACGGCG 
AGAGGCGCGN 
GGGGAGGACT 



GCCCTGCTGC 
ATGCGCCAAC 
CGCTCACGCA 
GTCGATGCCG 
GGAACACGCT 
ATTTGAACAT 
AGCCCCCTGC 
GACGGGCGGC 
GCTTCCTACC 
GGTATGGCGG 
GAGCATCGTG 
GGATTNTGAG 
GAAGACATTT 
AGCCGGAGAA 
AGCGTGCCGA 
GTACAAAACC 
CGATACCGAC 



TTGCCGCTTG 
CCCGCCGTGT 
GGAAGCGCGT 
ACNAAATTAT 
TCGGAAACGC 
TACCGTGCCG 
TGTACGGGGA 
AATGTCGAGT 
CGTCAAAGAC 
CGCAAACGCT 
ATGATAGACG 
CNGANAAGCC 
TGGAACATAA 
GACGCGCCCG 
TACCGTTACC 
AGCGTGCGGA 
GTGCAAAAAG 
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851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

5 This encodes a protein having amino acid sequence <SEQ ID 686>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

10 201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 

10 20 30 40 50 60 

15 orf 2 5a . pep M YRKL I AL P FALLLAAC GREE P PKALE CAN PAVLQX I RXN I QET LT QEARS FARE DXXQ F 

I I 1 I I I I ! I I M 1 I I i I I I I ! I I I I I I t 1 I I I I I I II I I II I I I I I I I I I I I I I II 
orf 25-1 MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 

10 20 30 40 50 60 

20 70 80 90 100 110 120 

orf 25a. pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 
MM M I M M M I M M M M II i 1 II I M M M II M M I II I M II M I I 

orf 25-1 VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

70 80 90 100 110 120 

25 

130 140 150 160 170 180 

orf 25a. pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 
1 M M M I M M I M I M M I I I M It I M M M M M M M M M M M ! M M I M M 
orf 25-1 SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
30 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 25a. pep MID GKAVKKE D AVR I X S XXAREX E PSKXXPEDILE HN AAGG D AD V PQ AGE DAPEPEILHP 

I II M II I M I M II I III MM M II M M I M M M MUM M I M M I I 
35 orf 25-1 MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 
40 M I II M I I I II II I II I I I II I II I! I M II I I I II M II I I I M 11 I M I M M I II 

orf 25-1 DDGERADTVT VSRGEVEEAR VQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

45 orf 25a . pep RQAAAQADRQE Y AE Y LKLQC DT RMTRER I QYLRG Y S I DX 

II I II M I I II I M II II II M M I M II I II I II II M 
orf 25-1 RQAAAQADRQE YAEYLKLQCDTRMTRERIQYLRGYS I DX 

310 320 330 

50 Homology with a predicted ORF from N. gonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N. gonorrhoeae: 

orf 25 . pep T D VQKE LVGE QRKWAQEK I SN C RQAAAQAD 30 

I M I M M I 1 M II I II II II I I II II I I I 
55 orf25ng VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNCRQAAAQAD 308 

orf 25 .pep RQE YAE Y LKLQC DT RMTRER I QYLRG YS ID 60 

M II M I M II M M M I I I M M M II M 
orf25ng RQE YAEYLKLQCDTRMTRERIQYLRGYS ID 338 

60 The complete length ORF25ng nucleotide sequence <SEQ ED 687> is: 
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1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCAGCGTG 

51 CGGCAGGGAA GAACCGCCCA AGGCGTTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAGGACAT ACGCGGCAGT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CGAGGCAAAC AGCCCCCTGC TGTATGGGGA 

351 AACGTCTTTG GCAGACATCG TGCAGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGCCAAAGAC 

4 51 GCTCGGACGG CATTTATCGA CAACACGGTC GGTATGGCGA CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GACAAAAGAA GACGCGGTCA GGGTTTTGAG CGGCAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACCCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCACCCG 

701 AACCCGAAAT CCTGCATCCC GACGACGTCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AACGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAGCGC AAGTGGGCGC AGGAAAAAAT CAGcaactgc 

901 cgACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTCCAATGC GACACGCGGA TGACGCGCGA ACggaTACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ED 688>: 



1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 



10 20 30 40 50 60 

orf 25-1. pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
f I I 1 I I i I I 1 i i I I I II I If I I 1 I I i I I I I I It II ! I i : i 1 II I I I I I I II I II I II I I 
orf25ng MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 25-1 . pep VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 
I I I I M I I I I I I I I I I I I ! ! I I I II I I I I II I I 1 I I I I I I t I 11 1 M : I I I I I i I I 11 : I 
orf25ng VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAEANSPLLYGETSL 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 25-1 . pep S D I VRQKTGGN VE FK DGVLT AAVR FL PVK DGQT AFV DN T VGMAAQT L S AALL P YGVKS I V 
: I I I : I I t II M ! I I I I I I I I t I M I I : I I : : I I I : I I I 1! I I : I I I I I I I I I I II I I I I 
orf25ng ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVKSIV 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 2 5-1. pep MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
I I I I I I I I I I I I I : I I I ( I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II ( I I M I I I I 
orf25ng MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 25-1 . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 
II t I II I II I I I I I II I II I 1 II 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
orf25ng DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 



310 320 330 339 

orf 25-1 . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
I I II I I I I II I I I 1 I II I M II I I I I I I I I I ! I I I I I I I 
orf25ng RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

310 320 330 



CHIR-0160 (356.001) 



-413- 



PATENT 



Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

5 ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fusion protein, and Figure 16B shows the 
results of expression of the His-fusion in E.coli. Purified His-fiision protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
10 analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 
that it is a useful immunogen. 

Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-L 
Example 82 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 689> 

15 1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

20 251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

// 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

25 1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

30 1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

35 1501 AAAAAA. . 

This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 

1 MQLIDYSHSF FSVVPPFLAL ALAVITRRVL LSLGIGILXX VAFLVGGNPV 

51 DGLTHLKDMV VGLAWSDXDW SLGKPKILVF XILLGIFTSL LTYSGSN. . . 

// 

40 251 TSLV 

301 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 

351 VGEMHTGDYL .STLVAGNIHP GFLPVILFLL ASVMAFATGT SWGT FGIMLP 

401 IAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 

45 501 KK. . 



Further work revealed the complete nucleotide sequence <SEQ ID 69 1>: 
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1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT T AC C T AC AAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

7 01 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 

7 51 ACCAAAGGTC GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

14 51 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 

1 MQLIDYSHSF FSVVPPFLAL A LAVITRR VL L5LGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK ILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFVV AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVYA LII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical transmembrane protein HI1586 of H.influenzae (accession number P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
N-terminus and Oterminus, respectively: 

Orf2 6 1 MQLIDYSHSFFSWPPFLALALAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Orf2 6 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 

// 

Orf 2 6 8 6 IFTSLLTYSGS— NTSLVFGGTCGVFAWLCTL-- GTIKTADYPKAVWQGAKSMFGXXXX 141 

+ F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 
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Orf2 6 142 XXXXXXXSTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGT SWGT FGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

5 Orf26 202 I AAAMA VKVE PAL 1 1 PCM S A VMAG AVC G DHCSPISDTTILSST GAR CNH I DH VT S QXXXX 261 

IAAAMA P L++ PC+ S AVMAGAVCGDHC S P+ S DTT I LSSTGA+CNH I DHVT+Q 
HI1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

Orf2 6 2 62 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 
IQ S L GF T + L V+IF +K + 

HI1586 479 AT VAT AT S I G Y IWG FT Y S GLAG F AAT AV S L I V 1 1 FAVKKR 519 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A of AT. 
15 meningitidis: 

10 20 30 40 50 60 

orf 26 . pep MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 
I I j I I M j I M I I I ! M I if I i ! I I II I i M I I I I I I I I M I I I M I II I I I I It I M 
orf 2 6a MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPVDGLTHLKDMV 
20 10 20 30 40 50 60 

70 80 90 99 

orf 26 .pep VGLAWSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX 

MINI! 1 I I I II I I III I I I M I M I I I I I I I I 
25 or f 2 6a V G LAW SPG DW S LGK PK X L V FL ILLGIFTSLLTY SGS NQ A FAD W AKRH I KN R RG AKMLT AC 

70 80 90 100 110 120 



30 



orf26.pep 

orf 2 6a LVFVTFID DYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 
130 140 150 160 170 180 



35 orf 26. pep 

orf 2 6a TLAGLLV TYKITEYTPMGT FVAMSLMNYY ALFALIMVFWAWFSFDI GSMARFEQAALlsiE 

190 200 210 220 230 240 

40 100 110 

orf 26. pep TSLV 

I I I I 

orf 2 6a AHDETAVSDGSWGRVYA LIIPVLALIASTVSAMI YTGAQASETFSILGAFENTDVNTSLV 
250 260 270 280 290 300 

45 

120 130 140 150 160 170 

orf 2 6 . pep FGGTCGVFAVVLCTL GTIKTADYPKAVWQGAKSM FGAIAILILAWLI5TW GEMHTGDYL 
I M I I I I : II I II I I II I I I I I I I I I I II I I I I I I I I I I I II I M I M i I I I II I II II 
or f 2 6a FGGTCGVLAWLCTL GTIKIADYPKAVWQGAKSM FGAIAILILAWLISTVV GEMHTGDYL 
50 310 320 330 340 350 360 

180 190 200 210 220 230 

orf 26 .pep STLVAGNIHP GFLPVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV EP ALIIPCMSA 
I I I I II I 1 M I I I I I I M I I I 1 I M I I I M I I I I I I I I I I I I I I I I I I : I : I II I I I i I 
55 orf26a STLVAGNIHP GFLXVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV DP SLIIPCMSA 

370 380 390 400 410 420 

240 250 260 270 280 290 

orf 2 6 . pep VMAGAVCG DHC SPI5DTTILS S TGARCNH I DH VT SQL P Y ALT VAAAAAS G YLALGL TK S A 
60 | i | | | | | | M I I I I I I I II I M I I I I I II M I I I I I I I I I ! I I I 1 M I I I I I I I I I 1 I 11 

or f 2 6a VMAGAVCG PHCS PI S DTT I LSSTGARCNH I DHVTSQLPY ALTVAAAAASG YLALGL TKSA 

430 440 450 460 470 480 

300 310 
65 orf 2 6. pep LLGFGTTGIVLAVLIFL LKDKK 

I I I II : I I M II II I I II I I I I 
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or f2 6a LLGF6XTGIVLAVLIFL LKDKKRANAX 
490 500 

The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG T CAT T AC CCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAANT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CAT T AAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

4 01 TCGCCGTCGG TGCGNTTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

4 51 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCGC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGGC 

751 AGCTGGGGCA GGGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG TGCACAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGTGCATTT GAAAATACGG ACGTGAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGCTCGGCAC 

951 GATTAAAATC GCCGATTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTTG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACAGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGN CCGTCATCCT TTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT CATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAT CCCTCACTGA TTATCCCGTG 

1251 TATGTCCGCC GTGATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

1451 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 694>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK XLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYT PMGTF 

201 VAMSLMNYYA LFALIMVBVV AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT 5W GTFGIMLP 

4 01 IAAAMAVKV P P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

4 51 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 

10 20 30 40 50 60 

MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
M I I I I i I f I t i I t I I 1 I I I I I I I I I I I I i I I I i I I II I f I i I 1 M I 1 f I I II I I I M I I 
MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
10 20 30 40 50 60 

70 80 90 100 110 120 

VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
I I I I I I I I I I I I 1 I ! I I 11 I I I I I I I M I I I I I M I I I I I I I I I M I M I I t I I I I I I I 
VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
70 80 90 100 110 120 

130 140 150 160 170 180 

L V FVT FIDDYFHS L A VG AXAR P VT DK FKV S RAKLA Y I L D S T AA PM C V LM P V S S W G A SUA 
! I M I I I I I II I I I I II I I I I I I I I 1 i 1 I I : I I I I I I I I I I I I I I M | | | | | | N | | | | 
LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 



orf 2 6a .pep 
orf26-l 

orf 26a . pep 
orf26-l 

orf 26a. pep 
orf26-l 
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130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 6a neo TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 
5 I1MIIIIMIMIIIIIMIMIMIH11MIM1IMI1INMIIIIIMMMII 
nrf?6-l TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

1 0 orf 26a pep AHDETAVSDGSWGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

Ml || I 111:: M | | | III I I M I I I I I M 1 I I I I I ! I M i IN I M I II I t M II I M 
orf 26-1 AH DE T A V S DAT KGRV Y AL 1 1 P V LAL I A S T V SAM I YT G AQA SE T F S I LG AFEN T D VTSI T S LV 

250 260 270 280 290 300 

15 310 320 330 340 350 360 

orf 26a pep FGGTCGVLAVVLCTLGTIKIADYPKAWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
I I I I ( I I I ( I I M I I I I I I M I M I I M I I I I I I I M I I I I II I II M I I I I I I M I II 
orf 2 6-1 FGGTCGVLAWLCTLGTIKTADYPKAWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

310 320 330 340 350 360 

20 

370 380 390 400 410 420 

orf 26a pep STLVAGNIHPGFLXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 

M I I I I I I I I I I I I M II I II I I I I II M I M I I M I I I I I 1 I 1 I I I I '• I : I I I II I I I 
orf 26-1 STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVE PALIIPCMSA 

25 370 380 390 400 410 420 

430 440 450 460 470 480 

orf 26a . pep VMAGAVCG DHCSPISDTTILSST GARCNH I DH VT S QLP Y ALT VAAAAAS G Y LALGLT KS A 

I I I I II M I I I I t I t ( i ( I I I I I i i M I M t I I ( I M I I I M I I M I I ! M I I I I I I I M 
30 orf 2 6-1 VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 

490 500 
orf 26a .pep LLGFGXTG I VLAVL I FLLKDKKRAN AX 

35 I M II : I I I I I I I I I I I II M I M M I 

orf 2 6-1 LLGFGTTGI VLAVLI FLLKDKKRANAX 

490 500 

Homology with a predicted ORF from N. gonorrhoeae 
40 ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 
respectively, with a predicted ORF (ORF26ng) from N. gonorrhoeae: 

orf 26 . pep MQ L I D Y S H S F FS W P P F L AL AL AV I T RRV LLSLGIGI LXX V AFL VG GN P V DGLT H LKDM V 60 

I I I I I I I I I I I I II I M I II M I I I II I I I M I I I I M I I I I I M I II I M M M I M 
orf26ng MQL I D Y S H S F F S VV P P F L AL ALAV I T RRVL L S LG I G I L V G VAFL VG GN P VD G LT H LK DMV 60 

orf 26. pep VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

11111:1 | II I I I I I I II I I I 1 I M I I I I I 1 I I I I 
orf26ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 120 



45 



50 // 



55 



or f 2 6 . pep TSLVFGGTCGVFAWLCTLGTIKTADYPKA 32 6 

I M II M I I I I : I I I I M : I I I I II I II I I 

orf2 6ng ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLAWLCTFGTIKTADYPKA 326 

orf 26 . pep VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 

I M I I I I M I I I I I M I I I I II I I I I I I I I I I I I I I II I I I M I I I I M I M II I I I I I I 

orf26ng VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 



60 orf 2 6. pep ATGT S WGT FG I ML P I AAAMAVKVE PAL 1 1 PCMS AVMAG AVCG DHC SPISDTTILS STG AR 44 6 

I I I I I I I I I I I I M II I M M I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II 

orf26ng ATGTS WGT FGIMLPI AAAMAVKVE PALI I PCMSAVMAGAVCGDHCS PI SDTTILSSTGAR 44 6 

orf2 6.pep CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKK 502 
65 I I I I 1 I I I M I I 1 I 1 I I M I I I I I I I I I I M I I I I I 1 I I 1 I I I I II I I I I I I I I I 1 

orf2 6ng CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKKRADV 506 
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The complete length ORF26ng nucleotide sequence <SEQ ID 695> is: 

1 ATGCAGCTGA TTGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TTTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGGCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGCATTTT CACTTCACTG CTGACCTACT CC'GGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGTGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGCC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

4 51 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCTCGC CCATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GATTGCTCGT TACCTACAAA ATTACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCG CTGTTTGCCC TGATTATGGT 

651 ATTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGAtg gCGCGTTTCG 

701 AACAGGCTGC GTTGAACGAA gcccaggacg aaaccgccgc tTCAGACgCT 

751 ACCAAAGGTC GTGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAATACCG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGTTCGGCAC 

951 GATTAAAACC GCCGATTATC CCAAAGCCGT GTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACGGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTAtCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGTTCGCCCA 

1301 TCTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTATGCC CTGACGGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

14 51 TTGGCACGAC CGGTATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCGACGTTTG A 

This encodes a protein having amino acid sequence <SEQ ID 696>: 



1 MQLIDYSHSF FSVVPPFLAL A LAVITR RVL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWADGDW SLGKPK ILVF LILLGIFTSL LTY 5GSNQAF 

101 ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFVV AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVY ALII PVLALIA5TV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

4 01 IAAAMAVKV E P ALIIFCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

4 51 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRADV* 



ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 



10 20 30 40 50 60 

orf 2 6-1. pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
I I I I I M I I I I I I I I I ! I 1 M 1 I M I I I I I I I I 1 I 1 I ! I I II I II I II I ! I I I I ! I I M I 
orf26ng MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 26-1 . pep VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
I ( I II : I i I I I i I I I M I I I I I I I ( I i I I i I I I I I I M I I I I I I I II II I I I 1 II I I I 1 
orf26ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 26-1. pep LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 
I I I II t I I I ( I I i I 1 I I I I II I I M I M I I I : I I I I I II I II : I I I I I I I I | | M 1 I I I I 
orf26ng LVFVTFI DDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTAS PMCVLMPVSSWGAS I IA 

130 140 150 160 170 180 



190 200 210 220 230 240 
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or f 2 6-1. pep T LAG LL VT YK I T E YT PMGT FV AM S LMN Y Y AL FAL I MVFW AW F S F D I G S MAR FEQ AALN E 
i I I M I I I I I II t I t M I f I I I I I I I M I ! I I M I I I I f I I I 1 I II M I II I I I I t I I M 
orf26ng TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 26-1 . pep AHDETAVSDATKGRVYALIIPVIALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
I : || I | : | M I ! I II I I I I I I I I I i I II I I II I I I I I I I i I I I I I i I I \ 1 I I I I i I I I I I 
orf26ng AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 * 290 300 

310 320 330 340 350 360 

orf 26-1 . pep FGGTCGVLAVVLCTLGT I KTAD YPKAVWQGAKSMFGAI AI L ILAWL I ST WGEMHTGD YL 
I M II I I M I I I I I : I M I I I I I I I I I I I I II I I I I M I I I I I II I I I I II II I I I I I I I 
orf2 6ng FGGTCGVLAWLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTVVGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 2 6-1. pep S T L V AGN I H P G FL P V I L FLL A S VMAF AT GT S W GT FG I M L P I AAAMAVKVE PAL IIP CM S A 
I I I I 1 II I M I I I II I I 1 I I I M I I ! M I I I I II I I I 1 II I I M M M I I ! I I M I II II 
orf26ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 2 6-1 .pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
I I I I II I I I I II M I M II I M I I I I I I I I I I M I II I I I I I II I I I I I I I I I II I I II I 
orf26ng VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 

490 500 
orf 2 6-1. pep LLGFGTTGI VLAVLI FLLKDKKRANAX 

I I I I I I II I I I I I I II I I I II I I I : : 
orf2 6ng LLG FGTTG I VLAVL I FLLKDKKRAD VX 

490 500 

In addition, ORF26 ng shows significant homology to a hypothetical Kinfluenzae protein: 

sp | P44263 | YF86_HAEIN HYPOTHETICAL PROTEIN HI1586 >gi | 1074850 | pir | (C64037 
hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi (1574427 (U32832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length = 519 
Score = 538 bits (1370), Expect = e-152 

Identities = 280/507 (55%), Positives = 346/507 (68%), Gaps = 7/507 (1%) 

MQLIDYSHSFFS V V P P FL AL AL AV I T RRXXXXXXXXXXXXXAFL VGGN PVDGLTHLK DM V 60 
M+LID+S S +S+VP LA+ LA+ TRR L +L V 



V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 



LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA n 



-f GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 



\QDETAASDATKGRVYALI I PVLALIASTVSAMI YTGAQA SETFSILGAFENTDVN 296 

+ D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 



TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 



Query: 


1 


Sbjct: 


14 


Query: 


61 


Sbjct : 


74 


Query: 


121 


Sbjct : 


133 


Query : 


181 


Sbjct: 


193 


Query: 


241 


Sbjct: 


253 


Query: 


297 


Sbjct: 


313 



Query: 355 HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP I AAAMAVKVE PALI 414 
TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 
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Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 

Query: 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 474 

+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 492 

Query: 475 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct: 493 GFTYSGLAGFAATAVSLIVIIFAVKKR 519 

Based on this analysis, it is predicted that these proteins from N meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 83 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 697>: 

1 . . AAGCAATGGT ATGCCGACGN . AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT . GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 

1 . . KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

4 01 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

4 51 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 

1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 VAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGVV LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of N. 
meningitidis: 

10 20 30 

orf 27 . pep KQ W Y ADX S I KT EMVMVN DE PAK I LT W DE S G 
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1 I 1 It I : I t I 1 M 1 I 1 I I I I 1 M I 11 I 1 I 
orf27a LSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVNDEPAKILTWDESG 
140 150 160 170 180 190 

40 50 60 70 80 

orf27 pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 
M I 11 I I 1 : I I M I I I I I I II II I I I I t I I I II I I I I I II I I M I I I M 
orf27a RLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORF27a nucleotide sequence <SEQ ID 701 > is: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

4 51 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GAT TACT CTC GGAACTGTCT 

601 ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

7 01 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 

1 MKKLSRIVFS TVLLGFSAAL PAQXY5VYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHAQXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGVV LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27a . pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
II M M M 1 I i I! I ! I M 1 I 1 ! I : I II I I I 1 ! 1 I M I I M I 11 I I M 11 : I 1 I I M I 
orf 27-1 MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVVAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 27a . pep XYPSMKKYSE PY I VASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 

I I i I I I II M II I M I I II I II I I 1 M I I I I II M I I I I M II I M I I I I I M I I M I 
orf 27-1 YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 27a . pep NGKKSAVMPYKNGLSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVN 
I 11 1 I 1 I I M I II I M I I I 11 1 1 1 II 1 11 I 1 M M I I I I I I 1 M 11 I 1 I : ! 11 I I I I 1 1 
orf 27-1 NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 27a . pep DEPAKILTWDESGRLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDG 
I II M I I I I I M II M I M II : I I I II I I M I I M I I II I I I M I I M II I I I I I II 
orf 27-1 DEPAKILTWDESGRLLSELSIRHHQRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf 27a .pep 
orf27-l 



YLIEPX 
MUM 
YLIEPX 
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Homology with a predicted ORF from N.gonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
N. gonorrhoeae: 

orf27 pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

I I I I I I II I M I M I M I II I M I I I II I 
orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVM^DEPAKILTWDESG 193 

orf27 pep RLLSELSIRHHQRNGVVLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 

M I! M I M I I : M I I I M I t I I I! I M I I I 1 I 1 11 M I M I II I I I I II 1 
orf27ng RLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 24 5 

The complete length ORF27ng nucleotide sequence <SEQ ID 703> is; 

1 AT GAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCT^CAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

4 01 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

4 51 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

7 01 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 704>: 

1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGVV LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27-1 . pep MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVVAGIAHAQDF 
I I I I I II I M II I I I I I I M I I I I I I II I I I I I I M I M I M I I M II I : I I M I I I I I 
orf27ng MKKLSRIVFS IVLLGFSAAL PAQTYSVYFNQNGKLTATMSSAAYIRQYSVAAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 27-1 . pep YYPSMKKYSE PYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
I II II I II I M M M M M M M II I I I I II I M I I I I I I M ! M M M II I I I M I 1 I I 
orf27ng YYPSMKKYSE PYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 27-1 . pep NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 
I I I II M M.I I I I I I I I I I M I I I II I I I I I M I I M I M II I I I I II II I I I I M II I I 
orf27ng NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 27-1 . pep DEPAKILTWDESGRLLSELSIRHHQRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
I I I I I I I I I I I I I I I I I I I II I M : I I I I I I I I i II I M I II I M I I M II I i I I I II I I 
orf27ng DEPAKILTWDESGRLLSELSIRHHKRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf 27-1 . pep 



YLIEPX 
I I I I I I 
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orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in Exoli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fusion in ExolL Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

4 01 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

4 51 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 

1 MKFTKHFVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 

101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ID 707>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

4 01 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 

651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 
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951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

5 1151 GTTGA 

This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

10 151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LFMLTAMLMA HGVLAW LSAV FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK PA FLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 5IRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

15 Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of N. 
meningitidis: 

20 10 20 30 40 50 60 

orf 47 . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLW 
M I M I I M M 1 M I I I I I I 11 I I M I I I I I I I I I II I i II ! I I ! I t I M I I II 1 I I I I 
orf 4 7a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEM IWGYAGLW 

10 20 30 40 50 60 

25 

70 80 90 100 110 120 

orf 4 7 . pep IAFLLTAVA TWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 
I I I I I M I I M M I I I II I I I I II I M I! I I I I I I I M II I M M II I M I I I II I II I I 
orf 4 7a IAFLLTAV ATWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 
30 70 80 90 100 110 120 

130 140 150 160 170 

or f 4 7 . pep MAL PVIRSQMQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVM 
I II I M I I M I II II M II I I M I II I I II M I I II I I I I I M I II I I I I I I 
35 or f 4 7a MAL PVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 

130 140 150 160 170 180 

orf 4 7a G T R 1 1 S F FT S KRLN V P Q I P S PKW V AQA S LWL PM LT AMLMAHG VM P W L S AAFA F AAG V I FT 

190 200 210 220 230 240 

40 The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 

1 ATGAAATTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

45 201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 

4 01 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 

50 4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 

651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

55 7 01 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 

7 51 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 
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901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 710>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASA5 GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRT5SVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

or f 4 7a . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
I M ( It I M I f I II I I I I M M I M I I M II I ( M ( I I I I I I M I M I I II I M I M I M 
orf 47-1 MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 4 7a. pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

I I M II I I It I It I i M i I I I I I I i M I II i II I I M I I M M I I M II M I II I I M I I 
orf 47-1 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 4 7a . pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

II I M I I I I I I I II II it I I M II I M M I II I i I I I M I I I M I I M I I I I I I 11 I I M 
orf 47-1 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 4 7a. pep GTRI I SFFTS KRLNVPQIPS PKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVI FT 

I I I I M I I I I 1 I I I I I 1 I I I M I I I I I I I I I I I I I I I I I I I I I : 1111:1111111111 
orf 47-1 GTRI I S FFT SKRLNVPQI P S PKWVAQASLWLPMLT AMLMAHGVLAWLSAVFAFAAGV I FT 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 47a . pep VQVYRWWYKPVLKE PMLWI LFAGYLFTGLGLI AVGAS Y FKPAFLNLGVHL I GVGG I GVLT 

I I I I II I M M I I I I I I II I I I II I I I I I I I I I I 11 I I I I I I 1 I I I I I I I I 1 I I M I I I I 
orf 47-1 VQVYRWWYKPVLKE PMLWILFAGYLFTGLGLI AVGAS YFKPAFLNLGVHLIGVGGIGVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 7a. pep LGMMART ALGHT GN P I Y P P PKAV PVAFW LMMAAT AVRMVAV F S S GT AYT H S I RT S S VL FA 

I I II 1 I I I I M I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I M I 11 I I I I I I I I I I II M I I 
orf 47-1 LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 

370 380 
orf47a.pep LALLVYAWKY I PWL IRPRS DGRPGX 

I I I I I 1 I I I I M 1 I I I I I II I I I I I 
or f 4 7 - 1 LALLVYAWKY I PWLIRPRS DGRPGX 

370 380 

Homology with a predicted ORF from N. gonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 
N. gonorrhoeae: 

ORF4 7 MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 60 
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ORF4 7ng 



| | | M M I II I I M I II M M i I I I I I I 1 I I 1 I M I II I I M I M I M I I I 1 II II I II I 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 



60 



ORF47 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 120 

I | | M | M | I I ( If M I I I I I II I M II I M M I I II M II I : I M M I I I M I II M I 
IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 



ORF4 7ng 



120 



ORF47 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 172 

| | | | | | | II I : I I I I I M I : I I I I II I I I I I M M I I M I I I II I I I I M I I 
ORF47ng MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 180 

The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAVA T WTGQPPTRGG V LVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

2 01 PKWVAHASLW LPMLNAILMA HRVMPW LSAA FPFAAGVIFT VQV YAGGITP 

251 IEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an He/ Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396, accession number e246540): 

TM segments in ORF4 7ng 



INTEGRAL 


Likelihood 




-5 


63 


Transmembrane 


52 


- 68 


INTEGRAL 


Likelihood 




-3 


88 


Transmembrane 


169 


- 185 


INTEGRAL 


Likelihood 




-3 


08 


Transmembrane 


82 


- 98 


INTEGRAL 


Likelihood 




-1 


91 


Transmembrane 


134 


- 150 


INTEGRAL 


Likelihood 




-1 


44 


Transmembrane 


107 


- 123 


INTEGRAL 


Likelihood 




-1 


38 


Transmembrane 


227 


- 243 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 

1 ATGAAATTTA CCAAACATCC CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCACTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGAC AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 

251 GCTTGACCGC CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGG CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TtcgCAAAAC CGGCGCAACT 

401 ATGtcgCCGT ATTCGCAATA TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 

4 51 CACGtccAgc tGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCCTG GTTATGGTGT CGGGCTTTAT CGGCCTGATT GGGATGAGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ACGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTACCCATGC TGACCGCCAT 

651 ACTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCGGGCGT GATTTTTACC GTACAGGTGT ACCGCTGGTG GTATAAACCC 

7 51 GTATTGAAAG AACCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCTGCCTTCC 

851 TCAATCTGGG CGTACATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATTCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CGTCTTCGGT TTTGTTTGCA CTCGCGCTGC TGGTGTATGC 

1101 GTGGAAATAC ATTCCGTGGC TGATCCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 714; ORF47ng-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRNYVAVFAI FVLGGTHAAF 
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20 



25 



30 



35 



40 



151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

201 PKW VAOASLW LPMLTAILMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEFMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNSIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 



orf 47-1 .pep 
orf47ng-l 

orf 47-1 .pep 
orf 47ng-l 

orf 47-1 . pep 
orf 47ng-l 

orf 47-1 . pep 
orf 47ng-l 

orf 4 7-1 . pep 
orf 47ng-l 

orf 47-1 . pep 
orf47ng-l 



10 20 30 40 50 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

M | | ! I I II I II I II I II M I M II I I I I II I M I M M I M I I I I I I I II I I M I M I I 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 
10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

I I I I M II I I I I II I I I M M ! M II I I II 1 I M I I II I M I : I II I I I M I 1 I 1 II II 
IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 
70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

I M I M I 1 M : I I 1 II I II :! I I I 1 I M II I I I I I I I I I I II II II I I II M I I I I I I I I 
MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVIFT 
j M I I I I I M M I M I I II I M M I I M I I I I I I I : I M I M : M ! I ■ ! I I I I I M I I 
GMR IISFFTS KRLN V PQIPS PKWVAQ AS LWLPMLTAI LMAHGVM PWL S AA FAFAAG V I FT 

190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWWYKPVLKE PMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 

II M I I I M I I M II M II I II II I I I II II I I M I II II I I I M M I I I II M I I M M 
VQVYRWWYKPVLKE PMLW XL FAG YLFTGLGL I AVG AS YFKPAFLNLGVHL IGVGGIGVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
MMIIIMIIIII II I I II M I I II I I I I M I I I I I I I II I I I I I II II I I M M I I I 
LGMMART ALGHTGN S I YP P PKAV P VAFWLMMAAT AVRMVAV FS SGT AYTH S IRT S S VL FA 

310 320 330 340 350 360 



45 



370 380 
or f 4 7 - 1 . pep LALLVYAWKYI PWL IRPRS DGRPGX 
I I I I II II M I I I I I I I II I M II I 
orf47ng-l LALLVYAWKY IPWLIRPRS DGRPGX 

370 380 



Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri 



50 



55 



60 



65 



gnl | PID | e246540 (Z73914) ORF396 protein [Pseudomonas stutzeri] Length = 396 
Score = 155 bits (389), Expect = 5e-37 

Identities = 121/391 (30%), Positives = 169/391 (42%), Gaps = 21/391 (5%) 

PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHAHEMIWGYAGLV 59 

P+W +AFRPF+ +LY L++ LW +TG GF WH HEM++G+A + 

PIWRLAFRPFFLAGSLYALLAIPLWVAAWTGLWP— GFQPTGGWLAWHRHEMLFGFAMAI 7 1 

VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 
V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
VAGFLLTAVQTWTGQTAPSGNRLVGLAAVWLAARL-GWLFGLPAAWLAPLDLLFLVALVW 130 



Query: 


7 


Sbjct: 


14 


Query: 


60 


Sbjct: 


72 


Query: 


120 


Sbjct: 


131 


Query: 


180 



MA 



+ +RNY 



+ ++ G 



IG R+I FFT + L 



W+ A L 



+ A+L A GV 



+V+ + L 



- -MPWLS AAFAFA 234 
PL FA 
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Sbjct: 191 IGGRVIPFFTQRGLGKVDAVKPWWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query: 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 

GV +++ RW+ K + K +LW L L+ + + +F A 
Sbjct: 250 IGVGHLLRLMRWYDKGIWKVGLLWSLHVAMLWLWAAFGLALWHFGLLAQSSPSLHALSV 309 

Query: 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHSIR 353 

M+AR LGHTG + P+AFL FS + 

Sbjct: 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FNLGTAARVFLSVAWPVGGLW 3 65 

Query: 354 T S S VLFALALLVY AWKY I PWL IRPRS DGRPG 38 4 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 366 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 85 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 715>; 

1 . . ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kinks As yTTG TAyrATwkkG 

251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGAT ATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

4 01 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

4 51 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 

1 . .MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFVWVY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI , . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
N .gonorrhoeae: 

orf 67 .pep MPSEGSDGXGXGEXE X V AHAQX D FVG FE AG 30 

MINIM I II | Mill I M I M I 
orf 67ng TNFEIAVLSGMTVRVFYCARPAPVNGGRLKMPSEGSDGIGIGESEAVAHAQRGFVGFEAG 14 6 

90 100 110 120 130 140 

orf 67 . pep VFQAS PWVTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 90 

i I M II I II : I : I f lilt:: : : : I I I I f If I : : 

orf 67ng V FQAS PVVVAVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLI GV FLRMS VRINRNCCV S I 206 

orf 67 . pep X WXXXX S R G FXX H RMN LM FN V S VG D ARAD I G FE FI VE FE I VN GG Q AE RRN G VE AA V S LM F 150 

I : I : : : I I II M I : II I I I I : I I I I I I I I M I I II I M I II Ml 
orf 67ng TRVGGKSTC Y FFSR I DAVS D VS VG DART D IG FE FWE FE I VNGGQAERRNGVE CAV FLMF 266 
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orf57 DeP CLGFEW WYLFSNFFSRRITFF-PFSVTGIICRYSPAAEI 190 

| || : : | : | : : I : I I I 111 I : 1 I I I : 

orf67ng RLLVFYVKLVAAKSFIILSFQLFYVHGIFIWPFPVTGIIRGDAPAAEVVADRHPGVDGM 326 

The ORF67ng nucleotide sequence <SEQ ID 71 7> is predicted to encode a protein comprising 
amino acid sequence <SEQ ED 718>: 

1 MPSETVGSIV NVGVDESVGF sppfpsiqhf yrfhrihrir lfrppgpmql 
51 NRHSHGSGNL GRGWATVLS DKFPCGQVRI PACAGMTNFE IAVLSGMTVR 
101 VFYCARPAPV NGGRLKMPSE GSDGIGIGES EAVAHAQRGF VGFEAGVFQA 
151 SPVWAVAGV QGQAGRDVYA HARHRAEAQ A AAAVAFLIGV FLRMSV RINR 
201 NCCVSITRVG GKSTCYFFSR IDAVSDVSVG DARTDIGFEF WEFEIVNGG 
251 QAERRNGVE C AVFLMFRLLV FYVKLV AAKS FIILSFQLFY VHGIFIW PF 
301 PVTGIIRGDA PAAEWADRH PGVDGMRTDV SEIIAYRAYF VFAWSGWFRI 
351 IVGKAFGGVG * 

Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from K meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 86 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 719> 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

4 01 ACTTGCGTTT TAT CAT TAT G GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ID 720; ORF78>: 



1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 
51 H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 
101 F DKYGNWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 72 1>: 



1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

2 01 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

4 01 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

4 51 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 



1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FIIM DGLAALISVP 

151 IWI YLGEYGA HNIDWLMAKM HSLQ SGIFVI LGIGATVVAW I WWKKRQRIQ 



CHIR-0160 (356.001) PATENT 

-430- 

201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homologue of H.influenzae (accession number P45280) 
5 ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

Orf78: 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM--GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ 1 CG FGVP I PE D+TLV+GGV I +G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 7 9 

10 Orf 78- 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMT PXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 

L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 
DedA: 80 LAG D S CM YW LG R IYGTKILRFRPIRRIVT LQRLRM VRE K F S Q YGNR V L FV AR FL PGLRAP 139 

Orf78: 122 VFVTAGI SRKVS YLRFI IMDGLAA 145 
15 +++ 4-GI+R+VSY+RF+++D AA 

DedA: 140 I YMVS G I TRRVS YVR FVL I D FCAA 163 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of N. 
20 meningitidis: 

10 20 30 40 50 60 

orf 78 . pep MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
I ! I : II I I! M II I M I 1 I M II I I M I II It I I I 1 I I I II I I I I I I I I M I I I I ! I M I 
orf 7 8a MFALLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
25 10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 8 . pep VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNW VLFVARFLPGLRT 
I I I I I I I I I I I I I I I I I I I 111 I I 1 I I I I I I II I II I I I I I I I I I I I M II I I 
30 orf 7 8a VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNW VLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 
orf 7 8 . pep AVFV T AG I SRKVS YLR FI IMDGLAA 
35 I I I M II I I I I I M I I I : I II I I I I 

O r f 7 8 a AVFV T AG I SRKV SYLR FLIMDGLAALI SVPVWI YLGEYGAHN I DWLMAKMH S LQ SGIFIA 

130 140 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 

1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

40 51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CAT AT TAT GT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

45 301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

4 51 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

50 551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

This encodes a protein having amino acid sequence <SEQ ED 724>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 
55 51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNWVLF VARFLPGLRT AVFVTAGISR KVSYLRFLIM DGLAALISVP 
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151 VWIYLGEYGA HNIDWLMAKM HSLO SGIFIA LGVLAAALAW F WWRKRRHYQ 
201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 

10 20 30 40 50 60 

or f 7 8a oep MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
1 M : M ! I I I I M M I I I I 1 M I I N 1 I I M I M I I I I I 1 1 M I I N 1 ) M t I I i t M M 
or f7 8-l MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf78a pep VLVGDGIMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
I M M I II M I I M I t I I I I : I I I M II I I I I II M M I I I I I I I I I I M I I I I I ! I I I 
or f 7 8 -1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf78a pep AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

I M M I I II II I I M II : M I M I I I M I I •' I I I M II I 1 M M I I I I I I I I I I I I I I : 
orf 78-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

130 140 150 160 170 180 



190 200 210 220 

or f 7 8a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
II: I : : : I I : I I : I I : : I : I I : : I : M I I : M I M I I I M I : : II 
orf78-l LG I GAT W AW I WWKKRQR I Q F YR S KLKEKRAQRKAAKAAKKAAQ S KQX 

190 200 210 220 



Homology with a predicted ORF from K gonorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 



gonorrhoeae: 

orf 78 . pep XXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 

I I I I II I M I M I I M I I I I M I I I I I M I 
orf 7 8ng YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 

orf 7 8. pep I IMDGLAA 14 5 

: I I I I I 11 

orf 7 8ng LIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 92 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 



1 . . YP VLFVARFL PGLRTAVFV T AGISRKVSYL R FLIMDGLAA LISVPVWI YL 
51 GEYGAHNIDW LMAKMHSLQ 5 GIFIALGVLA AA LAW FW WR K RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ED 727>: 



1 atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 

101 TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 

201 GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

2 51 CGATTGCGCG CAT CAT G AC G CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

4 51 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG AC AT TAT C AG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 
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This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG V LAG D G VM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

2 01 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf78-l pep MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
| M ; I I I I I I I I I M M ! I ! ! I I ! M I I I I I I I I M I I! ! ! I I I I I I M I 1 I ! I i I I I I i 
orf7 8ng-l MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 78-1 . pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
N : I I I : II I I I M I M I M : I I I I M I ! I I I II I II I I ! I I N ! I M I I I I I I I 1 I I I 
orf7 8ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 78-1. pep AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

I M M I I I I I ! I I i I I : I I M I I I I M I I : I II ! I I I I I I I I I I I I ! i I I I I I I I ! I : 
orf 78ng-l AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

130 140 150 160 170 180 



190 200 210 220 

orf 78-1 . pep LG I G AT WAW I WWKKRQR I Q F YR SKLKE KRAQRKAAKAAKKAAQ S KQX 
||: | : : : | | : I | : | I : : I : I I : : I : I I I I : I I I t I M If I I : : I I 
orf 78ng-l LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 

190 200 210 220 

Furthermore, orf78ng-l shows homology to the dedA protein from Kinfluenzae: 



sp| P45280 |YG29_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi | 1073983 ( pir i | D64 133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574476 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect = 7e-58 

Identities = 108/182 (59%), Positives = 140/182 (76%), Gaps = 2/182 (1%) 

Query : 5 LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGVL 62 

L FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GVL 

Sbjct : 21 LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 80 

Query : 63 AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 122 

AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 
Sbjct : 81 AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 140 

Query: 123 FVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALG 182 

++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I+I +G 
Sbjct: 141 YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 200 

Query: 183 VL 184 
L 

Sbjct: 201 YL 202 



Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 87 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAAT CCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

This corresponds to the amino acid sequence <SEQ ID 730; ORF79>: 



1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH . . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 



1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAAT CCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 



1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

orf 7 9 . pep MKKLLAAVMMAGLAGA VSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
II I I i I M I I II I II I I I I I : I I I II I I I I I I I II I : I M I I I I I I I I I I I I M I I I I I 
orf 7 9a MKX LL AAVMMAG L AG A V S AAG I H VE D G W ART T VE GMKMG G AFMK I HN DE AKQ DFLLGGSS 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 9. pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I Mill I I I I I 
orf 7 9a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

70 80 90 100 110 120 



130 140 
orf 7 9 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNH 
I I M I I I M I I I I I I I II III I I : I 
orf 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

130 140 150 
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The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 



1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 



10 20 30 40 50 60 

orf 7 9a . pep MKXLLAAVMI^GIJ^GAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 
II I I I I II i I 1 M I M I II t : t t f II I M I II I I I I : I I I I I t t M t I I I I I I I it I I I 
orf 7 9-1 MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 7 9a . pep PVADRVEVHTHINDWGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
I ! I I I I I M I II I I I 1 I I I I II I I I I I I I I I I I I II I I I i I I II I I i I I I I II Mill 
orf 7 9-1 PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 



130 140 150 

or f 7 9a . pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
I I I I I I I I I I I I I I M I I (It I I : I I I I I I I I I I I I 
orf 7 9-1 VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 

130 140 150 



Homology with a predicted ORF from N. gonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N. gonorrhoeae: 

orf 7 9. pep FMKI HNDE AKQ D FLLGGS S PVADRVE VHTH IN DNGVMRMRE VEGGVPLEAKS VTE LKPGS 101 

I I I I I I I I 1 I I I : I I I I I I II I I I I I I I I I 
orf 7 9ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 



orf 79 .pep YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKIAPMPAMNH 147 

I II I I II M I I I II I II II I M I I I I I I I I I I I I I I I IK I I II 
or f 7 9ng YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 8 6 

An ORF79ng nucleotide sequence <SEQ ID 73 5> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 



1 . . INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 
51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 737>: 



1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

201 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 
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251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

5 451 CACGGCGAAG CG CAT CAGC A CTAA 

This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 

51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 

10 151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf 7 9-1 pep MKKLLAAVMMAG LAGAVS AAG VHVE DG WARTT VEGMK I GG AFMK I HN DE AKQ D FL LGG S S 
I | | | | I II I I I M I M I I I I I I I H I 1 I I I I I I I I M : I I I I M I I I I M 111:11)1 
1 5 orf 7 9ng-l MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 9-1 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
Q 20 I I I I M I I I I I I I II I I I I I I 1 I : M I I I 1 I I I II I I ! I I M I I I I M M I I I M I I M I 

^ orf79ng-l PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

5 7 3 70 80 90 100 110 120 

O 130 140 150 

IJ 25 orf 79-1. pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 

- I I I I I I I I I 11 I I I I I II 111 ) I I 1 I I I II I I I I I I 

T - orf79ng-l VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 

H 130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

■™ 30 gi 1 2983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 

-"f= Score =63.6 bits (152), Expect = 6e-10 

y Identities - 38/114 (33%), Positives - 58/114 (50%), Gaps = 1/114 (0%) 

•=f Query: 24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV 83 

35 V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
40 Sbjct: 87 ER-IEIPPKGKVEFKHHGYHVMIIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
45 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 8A shows 
the results of affinity purification of the His-fusion protein. Purified His-fusion protein was used 
to immunise mice, whose sera were used for ELIS A (positive result) and FACS analysis (Figure 
1 8B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a useful 
immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
739>: 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 74 1>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ED 742; ORF98-l>: 

1 MTEXAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N, meningitidis (strain A) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 98. pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
I I I I I I t i I i I I I II f f I I I f f I i I I i I I I M I i I f t II I I I f I I I i I I I 1 f ! I I I | | 
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MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 

70 80 90 100 110 120 

G FNI PGLGVI VAI AVLFVTGLFAANVLGRQI LAAWDSLLGRI PWKS I YS SVKKVSEYVL 

| I I | M I M ! I I 1 I I I 1 1 I II I 1 M M N I M I I 1 1 I I 1 I I I I I I I t i i I I i < I I I •< 
GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 

| | I I I ! I I 1 I I I ! I I ! I I I I I I I I I I I I I I I ! I 1 I I I i I I I I I I I I I I I I I I 1 I I ! I I 
SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

IMVKKSDVRELDMSVDEXLKYVI SLGMVI PDDLPVKTLAXPMPSEKADLPEQQX 
I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
IMVKKSDVRELDMSVDEALKYVI SLGMVI PDDLPVKTLAGPMPSEKADLPEQQX 
190 200 210 220 230 

The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG ' GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

2 51 CAAACGTATT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CTTGTTGGGG 

301 CGGATTCCGG TTGTGAAGTC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 NTCGTTGCTG TCCGACAGCA GCCGTTCGTT TAAAACACCA GTACTCGTGC 

4 01 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWVV SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVI SLGMVI P DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 

10 20 30 40 50 60 

MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
III I I II 1 I I I M I I I I ! I I I I I I I I I I I I II I I I I I I I I I I I I ! I II I I 1 I II I I I I I 
MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 

70 80 90 100 110 120 

GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 
I I I I I I I I I I I I I I I I I I I 1! I I I M I I II I 11 I I! 1 I I I I I I I I I I I I I I II I I I II I 
GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
I I M I I II I I II I I I I I I I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I I I I I I I I I I I I 
SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 



orf 98a 

orf 98. pep 

orf 98a 

orf 98 .pep 
orf 98a 

orf 98 .pep 
orf 98a 



orf 98a . pep 
orf98-l 

orf 98a . pep 
orf 98-1 

orf 98a .pep 
orf98-l 



orf 98a . pep 



190 200 210 220 230 

IMVKKSDVRELDMSVDEALKYVISLGMVI PDDLPVKTLAGPMPSEKADLPEQQX 
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f ti I I i I I I II M I II M I M II M I M I I I I II I M I II I M I ! I I M I I II I 
orf 98-1 IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
190 200 210 220 230 

Homology with a predicted ORF from N. gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 
N. gonorrhoeae: 

10 20 30 40 50 60 

orf98 pep MTVTAAEGGKAAKALKKYLITGILWLPIAVTWVVSYIVSASDQLVNLLPKQWRPQYVL 60 

II I I I I I I I I I I I I I I I M I M I I I I I I I I I I I 1 M II I I M M I I I I M I I I I I I II 
orf98ng MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 60 

orf 98 pep GFN I PGLGVIVAI AVLFVTGLFAANVLGRQILAAWDSLLGRI PWKS I YS SVKKVSEYVL 120 

I I I I ! I 1 1 1 I I I I M I M i I M M I I I I I M I I I I I II I II I M I I I I M I II I I I : I 
orf 98ng GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLXRIPWKS I YS SVKKVSESLL 120 

orf 98. pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 

I M I I I I I I I M I I II I 1 I I I I I I M I I I I II I M I I I 11 I I I M M i I I M I M I I I 
orf 98ng SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 180 

orf 98 .pep IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQ 233 

I II M 11 M 1 I II II II 1 I I I I I 1 I I I I I M I I I I M I III 111:11111 
orf 98ng IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQ 233 

The complete length ORF98ng nucleotide sequence <SEQ ID 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>: 



1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLX 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 



1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ACCAGCTTGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

301 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 



1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 



10 20 30 40 50 60 

orf 98-1 .pep MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
Ml M I I I I I I II I I I I I II I I I I I I I I I I I I I I II I M I I I I I I I I It I 1 I I I I I ! I 1 
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orf98ng-l MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98-1 pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
| M | | | } | || | 1 | | 1 ! II f I 11 I I I I i I I I I I I i I I I I t I I 1 I I I I I I t I I I I I I I I I ! t 
orf98ng-l GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98-1 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
I I I M M i I I I I I i M ! I I i II i ! ( I I I I i I t t I i I I I : I I I I I M I I I M I i I t I t I I 
orf 98ng-l SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98-1 , pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
I M 1 I I I I I I ! I I f I I I I i I I I i I I I I I I I I I 1 I i I I I I I I t i 111:111111 
orf 98ng-l IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQX 
190 200 210 220 230 

Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 89 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 749>: 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG G^GgTACTCA 

201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 G^GAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

4 01 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

4 51 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 

601 . GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

701 GGGCATAT£C GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

7 51 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC. . . 

This corresponds to the amino acid sequence <SEQ ID 750; ORF100>: 

1 ■ MKTVVWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 
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351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH . . . 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1>: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 

201 TATCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

251 CCGCGCTTGC CTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

301 GAAAAGGCGG AACTAGAAGC CTCACGCGTG TTGGTCAACA AAGAGGCCGG 

351 AGACAACCGG ACTTTGGCAT TGATGCTGGG CGCGCACGCC GCCGGACAGA 

4 01 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

451 CCGGAAAAAC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 

501 GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 

601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGCTGGCG GAT.GCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

951 CGATTTTGCC GATGCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 

1101 TTTGGTTCTA GCAAAGGTTT TCGACGAAAT CGGAGAACCG CAGAAGGCGG 

1151 AGGCGCAGCG CAACTTGGTT TTGGAAGCCG TCTCCGATGA CGAACGTCAC 

1201 GCAGCGTTAG AG C AG CAT AG CTGA 

This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>: 



1 MKTVVWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVVVWYFLFK FIIGV LNIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 

4 01 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF100 shows 93.5% identity over a 386aa overlap with an ORF (ORF 100a) from strain A of TV. 

meningitidis: 



10 20 30 40 50 60 

orf 100 . pep MKTVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAEVLGSLI AVVVWYFLFK 
M I M I I M I I I I ! I 1 II I I II I I I I I II I I I I I I I I M I II I I I I I II I M M I M I 
orf 100a MKTVVWIVVLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAHVLGSLI AVVVWYFLFK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 
MMIM I I I I I II ( I I I I I I II I I I I I I I II II I II II I I II I II I II : I I I 
orf 100a F 1 1 G V LNX P E KMQR FG S ARKGRKAAL ALN KAG LAY FE G RFE KAE LEAS R V LGN KE AG DN R 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100 .pep TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
MINI I I I I I I I I I I I I I I I I I I I I II I I I II I M I I I I I M I I I M I I I I I I I I I 
orf 100a TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 



CHIR-0160 (356.001) 



-441- 



PATENT 



orflOO pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

| | ] | M I I 1 II I 1 I 1 :ll Mil I Mill III II I HI II I I I I t I M II I I M ! M 
o r f 1 0 0 a AAAKMN ANLTRL VRLQLR YAFDRGDALQVLAKTEKX SKAGAXGKSEMERYQNW AYRRQLX 

190 200 210 220 230 240 

250 260 270 280 290 300 

orflOO pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
M II II I I M M M M I I M I II II M I II II I I I I I I M I II M I II I 1 I I I I I I I I I 
orf 100a DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 ■ 290 300 

310 320 330 340 350 360 

orflOO pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 
| M II M I M I : II I M M M M M II M I II I I II II II : II I I I II I M M M I I I I 
orf 100a FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 
310 320 330 340 350 360 

370 380 
orf 100. pep KPSISARLVLTKVFDEIGEPQKAEAH 
i M II I II II : I i I I I II I I M I I : 
orf 100a KPSXSARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSAETHX 

370 380 390 400 

The complete length ORF 100a nucleotide sequence <SEQ ID 75 3> is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAAAACGG 
GGCATTGGCG 
AGACCATGCT 
GCCGTCGTGG 
TANCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
GGATAACCGG 
TGGAAAACAT 
CCGGAAAAGC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGANGTATCT 
AAAGGCTACC 
TTTGGTTCTG 
AGGCGCAGCG 
TCCGCCGAAA 



TAGTCTGGAT 
TCGGGCATTN 
CAGAATCAAC 
TGTGGTATTT 
AAGATGCAGC 
TTTGAACAAG 
AACTTGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTNGGGCA 
CCAGCTGNCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCATTGA 



TGTCGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTCAAA 
GTTTCGGTTC 
GCGGGTTTGG 
CTCGCGCGTA 
TGATGTTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAA 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TTGACGAAAC 
TTGGCAAGCG 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGTAAA 
CGTATTTTGA 
TTGGGAAACA 
CGCACATGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GACCCGAACT 
CGCGATCAGC 
GCCCGATAAT 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TTGCCGAGGA 



CNNTCGGGCT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCGGG 
GCCGGGCAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAANTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCN 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
AAACCGNCCT 



This encodes a protein having amino acid sequence <SEQ ID 754>: 



1 MKTVVWIVVL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVVVWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMN AN LT RLVRLQLRYA 

201 FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

401 SAETH* 



ORF 100a and ORF 100-1 show 95.1% identity in 406 aa overlap: 

10 20 30 40 50 60 

orf 100a . pep MKTWWIVVLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
I I M 1 II II II I M M II I 11 I 1 I M II I I M I I M II I I M I I M 11 I I II II 1 I I I 
orf 100-1 MKT VVWI VVL FAAAVGLALASG I YTGDVY I VLGQTMLRINLHAFVLG S L I AVWW YFL FK 
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10 



20 



30 



40 



50 



60 



10 



15 



20 



25 



30 



35 



70 80 90 100 110 120 

nrflOOa oe D FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

* p p mini inn iii i ii iii ii nun in nun in mil it it mi mi 

nrflGO-l FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEBCAELEASRVLVNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orflOOa pep TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

1 1 i i i i i i i t i 1 1 m n t i m i m m i m m i m i m n i i i m i i i i m i 

orf 100-1 TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 100a . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

1 ( | | | I I 111 I ill I I III I I I I I III I I I I I Ml HI 11 Ml III III III 11 I M 
orf 100-1 AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQTOAYRRQLA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100a. pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

i 1 1 m ii m i m i m i i m m i m i i m i m i m m i ii mi m i ii m i m ii 

orf 100-1 DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 100a pep FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 
M M M I I M 1:1 M M M M I M M I M M Ml I 11 M II I M I M M M II Ml Ml 
orf 100-1 FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 

370 380 390 400 

orf 100a. pep KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 

I M M 1 I Ml M M It M t II I M I II I M : I : : : : I : t I t 
orf 100-1 KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

370 380 390 400 



Homology with a predicted ORF from N. gonorrhoeae 
40 ORF100 shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 



45 



50 



55 



60 



65 



N. gonorrhoeae: 

orf 100 .pep 
orf 100ng 
orf 100 .pep 
orf lOOng 
orf 100 . pep 
orf lOOng 
orf 100. pep 
orf lOOng 
orf 100 . pep 
orflOOng 
orf 100 .pep 
orflOOng 



MKTWWIVVLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

I M M Mi Ml I II It II Ml I M Ml M 111 M Ml M M I Ml II I II I M I I M I I I 
MKTVVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

FI IGVLNI PEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 120 

M M II I M I : I : I II M M I I I II M M M M M M M II M M M II : Ml 

FI I GVLN I PENMRRSG SARKGRKAALALNKAGLAY FEGRFEKAELE AS RVLGNKE AG DNR 120 

TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 
MIMI MIMIIMI M II Ml M ! M I II I II Ml II II Ml M I M 1 It II Ml 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 240 

II II M M 11 II M I M II II I I II M M II M I II II II M M Ml II II M M IM I 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 24 0 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 300 
II II M M M I II II II II II II 1 II II II M M I II M M M Ml M II M M M II I 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 300 

FVES VRFLGEREQQKAI DFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEAS I AL 3 60 

II II M M M II M I II II II Ml I II M I It t M II It IM M 1 M II II II M M II I 

FVES VRFLGEREQQKAI DFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEAS IAL 3 60 
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orf 100 . pep KPS I SARLVLTKVFDE IGE PQKAEAH 386 

I I II M I I I : I I I I I : ■■ I I I i I • 
orflOOng KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 405 

The complete length ORFlOOng nucleotide sequence <SEQ ID 755> is: 

1 ATGAAAACGG TAGTCTGGAT TGTTGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CCTGTTTAAA TTCATCATCG GCGTACTCAA 

201 TATCCCCGAA AATATGCGGC GTTCCGGTTC GGCGCGGAAA GGCCGCAAGG 

251 CCGCGCTTGC CTTGAATAAG GCGGGTTTGG CGTATTTCGA AGGGCGTTTT 

301 GAAAAGGCGG AACTCGAAGC CTCTCGAGTG TTGGGCAACA AAGAGGCCGG 

351 AGACAACCGG ACTTTGGCAT TGATGCTGGG CGCGCACGCG GCAGGACAGA 

401 TGGAAAATAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

451 CCGGAAAAAC AGCAGCTTTC CCGCTATCTT CTGCTGGCGG AATCGGCGTT 

501 AAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCC 

601 TTCGATCGGG GCGATGCGTT GCAGGTTCTG GCAAAAaccG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGATGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGagcGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

951 CGATTTTGCC GATTCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGCCGGCTC GCCTACGGCC GCAAACTTTG GGGTAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG TATTGCACTG AAGCCGAGTA TTCCGGCGCG 

1101 TTTGGTGTTG GCAAAGGTTT TTGACGAAAC CGCACAGTCG CAAAAAGCCG 

1151 AAGCACAGCG CAACTTGGTT TTGGCAAGCG TTGCCGGGGA AAACCGCCCT 

1201 TCCGCCGAAA CCCGTTGA 

This encodes a protein having amino acid sequence <SEQ ID 756>: 

1 MKTVVWIVVL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVVVWYFLFK FIIGV LNIPE NMRRSGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQMA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DSWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSIPARLVL AKVFDETAQS QKAEAQRNLV LASVAGENRP 

401 SAETR* 

ORFlOOng and ORF100-1 show 95.3% identity in 402 aa overlap: 

10 20 30 40 50 60 

MKTVVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
M M I t I I I M I I I I I I I I I I I I f I i M I I I I I I M I I I I I I t I I I I I I I I I I I I I (I It 
MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 
10 20 30 40 50 60 

70 80 90 100 110 120 

FIIGVLN I PEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 
I i I I I I I I I I : I : I I I I I I I I f II I I I I II I I I I I I I I I I I M I I I II t I I I I M I I I 
FI IGVLNI PENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGKKEAGDNR 
70 80 90 100 110 120 

130 140 150 160 170 180 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
I II II I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I II I I 
TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
130 140 150 160 170 180 

190 200 210 220 230 240 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
I II I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I M I I I I I I I I I I I M I I I I I : I 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERY QNWAYRRQMA 
190 200 210 220 230 240 



orf 100-1. pep 
orflOOng 

orf 100-1 .pep 
orflOOng 

orf 100-1 .pep 
orflOOng 

orf 100-1. pep 
orflOOng 
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250 260 270 280 290 300 

orf 100-1 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
I I I I I I I II { I { I M I II t I I I 1 I I t I I M I II I I I I M I I II I I I I II I M I I M I M 1 
orflOOng DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 100-1 . pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
1 I I I I I I I I I I I M II I II I I : I II I I M I I I I I M I I M I I I II M I I I II I I I I I II I 
orflOOng FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 



370 380 390 400 

orf 100-1 .pep KPS I SARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

II I I I M I I I I M I I : : I I I I I I I I I I I ■* I : : : I : I 
orf lOOn KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETRX 

370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 90 



The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
757> 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>; 



1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAVVFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 



l 

51 
101 



MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MA MIDVPRGN PEYVRLSGMA 
VRLYRFMSP L GFGAVVFGAA IPFAAG WWGS GWVHV KLCLG LMLLAYQLYC 
GVL LRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 
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Computer analysis of this amino acid sequence gave the following results: 

Homology with HP1484 hypothetical integral membrane protein of H. pylori (accession number AE00Q647) 
ORF102 and HP1484 show 33% aa identity in 143aa overlap: 

FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 
F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK— KLYSFIASPAM 65 

GAVVFGAAIPFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 119 

G + + + GW+H KL L -f+LLAY YC +R + + R+Y 

GFTLITGILMLLIEPTLFKSGGWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRNARFY 125 

RVFNEIPXXXXXXXXXXXXFKPF 142 
RVFNE P KPF 
RVFNEAPTILMILIVILVVVKPF 14 8 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF102a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 102. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I I I I M M ! I I I I I II I II I I I I M I I I I M I I I I M I M I I I I M I I M I I I I ! M M I 
orf 102a MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 1 02 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I M I M I I I II I II I M I II I I I I M I I I I I I I I I M I M I I II I I I M I II I I I I M II 
orf 102a GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



orfl02 


3 


HP1484 


8 


orfl02 


63 


HP1484 


66 


orfl02 


120 


HP1484 


126 



130 140 
orf 102. pep V FN E I P V L LMVAAL YX W FK P FX 
M I II I M M I I I I I I I I I II I 
orf 102a VFNEIPVLLMVAALYLVVFKPFX 

130 140 

The complete length ORF102a nucleotide sequence <SEQ ID 761> is: 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 762>: 



1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAVVFGAA IFFAAG WWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF 102a and ORF 102-1 show complete identity in 142 aa overlap: 



10 20 30 40 50 60 

orf 102a. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMA VRLYRFMSPL 
I I I I U I i I I I I ( I I I I I I I II M I II I I I I i II I I M I I I 1 I I I I I i I I I I M I I I I I I 
orf 102-1 MMFSW FKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGNPEYVRLSGMAVRLYRFMS PL 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orfl02a pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
| I M I I I I ( M M t I I I I I I I M I I I M I M I t I I I I I f M M I I I I 1 I I I II M I I I I 1 
orf 102-1 GFGAVVFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



130 140 
orf 102a. pep VFNE I PVLLMVAALYLWFKPFX 
I I II I I I I I I I I M I I I I I I I I I 
orf 102-1 VFNE I PVLLMVAALYLWFKPFX 

130 140 



Homology with a predicted ORF from A ^gonorrhoeae 

ORF 102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from N. 
gonorrhoeae: 

orf 102 .pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 60 

I | | M I II I I ! I M ! I I I I I I M I I M I I I I ! I I I : I I M I I I I I M I I I I I I I I I I II I 
orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 60 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

I I I II M I M I M I I I I M II I II II 1 I II I II I ( I i I I I I I I I I I M I I I I M I I II I 
orfl02ng G FG AW FG AAI P FAAGRWG SG W VH VKLC LGLML LAYQL YCG VL LRR FQD Y S NAFS HRW YR 120 



orfl02.pep VFNEIPVLLMVAALYXWFKPF 142 

I I II I I I M I I I M I II I II I 
orfl02ng VFNEI PVLLMVAALYLVVFKPF 142 

The complete length ORF102ng nucleotide sequence <SEQ ED 763> is: 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGCGCC GCGCGGCAAT CCCGAGTATG TGCGCCTGTC GGGGATGGCG 

151 GTGCGGTTGT ACCGTTTTAT GTCGCCTTTG GGTTTCGGCG CGGTCGTGTT 

2 01 CGGCGCGGCG ATACCGTTTG CCGCcggccg GTGGGGCagc ggctggGTTC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTATCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAAcg aAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 764>: 



1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDAPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IP FAAG RWG S GWVHV KLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLVVFK F F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 



10 20 30 40 50 60 

orf 102-1. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I M I I II M I I I ! I I I I M I I I I I II I I I I I I I I I : I I I I I I I II I ! I M II I I I I I I II 
orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 102-1 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I I M I II I M I I I I I I I M I I I M I I I I I I I I I M I I I I I II I I II I II I I I I I I I I I I 
orfl02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



130 140 
orf 102-1. pep VFNEI PVLLMVAALYLWFKPFX 
M M M If I I I I I It I I I I I I I I 
orfl02ng VFNE I PVLLMVAALYLWFKPFX 

130 140 
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In addition, ORF102ng shows significant homology to a membrane protein from H.pylori: 

gi 1 2314656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length = 148 
Score = 79.2 bits (192), Expect = le-14 

Identities = 50/147 (34%), Positives = 68/147 (46%), Gaps - 13/147 (8%) 

Query: 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V++ +LY F+ + 

Sbjct: 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK — KLYSFIASPAM 65 

Query: 63 GAWFGAAIP FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFS 115 

G + + F +G GW+H KL L ++LLAY YC +R + + 
Sb j ct : 66 GFTLITGILMLLIEPTLFKSG GWLHAKLALVVLLLAYHFYCKKCMRELEKDPTRRN 121 



15 Query: 116 HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 

R+YRVFNE P KPF 
Sbjct: 122 ARFYRVFNEAPTILMILIVILVWKPF 148 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N, gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

20 Example 91 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 765>: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

25 //.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

30 251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT AT GAG AG AC A GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

401 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

4 51 CCGCGCCGAT AA 

35 This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 

1 MAKMMKWAAV AAVAAAA VWG GWS.LKPEPH VLDITETVRR G 

51 

101 

151 

40 201 1 SF2TLSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKVVI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the further partial nucleotide sequence <SEQ ID 767>: 

45 1 ..GTATCGGTCG GCGCGCAGGC ATCGGGGCAG AT T AAGAT AC TTTATGTCAA 

51 ACTCGGGCAA CAGGTTAAAA AGGGCG AT TT GATTGCGGAA ATCAATTCGA 

101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 

151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 

50 251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 

301 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 

351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 

401 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 

451 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 

55 501 GATTGCCGAG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 

551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 

601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 
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651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 

701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 

751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 

801 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 

5 851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 

901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 

951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 

1001 GATAA 

This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 

JO 1 ..VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR QAALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTILSEV DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

15 251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity over a 153aa overlap with 
20 an ORF (ORF85a) from strain A of N. meningitidis: 

10 20 30 40 

orf 85 .pep MAKMMKWAAVAAVAAAAVWGGWS-LKPEPHVLDITETVRRG 
I t I I I I I I I I I I i II I I I I 11 II Mill:: I I I I I I 1 I 
orf 85a MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITETVRRGDISRTVSATGEISPSNLVS 
25 10 20 30 40 50 60 

// 

80 90 100 

orf 85. pep ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

I I I I I I I I I I I I I I I I II I I I M I I I II I I 
30 orf 85a TIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

210 220 230 240 250 260 

110 120 130 140 150 160 

orf 85 . pep GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGK 
35 I I I I I I I t I I I I I I i I I I i I I I M i I I I I I I I I II I I I I I I I I M i I I I II I I I I I I I I : 

orf 85a G YNS S T DT ASN AV YY YARS FVPN PDGKLATGMT TQN T VE I DG VKN VL 1 1 PS LT VKNRGGR 

270 280 290 300 310 320 

170 180 190 200 210 220 

40 orf 85 . pep AFVRVLGADGKAAEREIRTGMRDSMWTEVKSGLKEGDKWISEITAAEQQESGERALGGP 

I M I I I I I I M I I I I I I I I I I I I I I II I II I I 1 I M I I I I II ! I I I I I M I I I I I I M I i 
orf 85a AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
330 340 350 360 370 380 

45 230 

orf 85. pep PRRX 
I I I I 

orf85a PRRX 
390 

50 The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAGCCGCAG GCTGCTTATA 

101 TTACGGAAAC GGTCAGGCGC GGCGACATCA GCCGGACGGT TTCTGCAACA 

151 GGGGAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCATCGGG 

55 201 GCAGATTAAG AAACTTTATG TCAAACTCGG GCAACAGGTT AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCTCGC AGACCAATAC GCTCAATACG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

4 01 AGGATGATGC GACCGCTAAA GAAGATTTGG AAAGCGCACA GGATGCGCTT 

60 451 GCCGCCGCCA AAGCCAATGT TGCCGAGCTG AAGGCTCTAA TC AG AC AG AG 
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501 CAAAATTTCC ATCAATACCG CCGAGTCGGA ATTGGGCTAC ACGCGCATTA 

551 CCGCAACGAT GGACGGCACG GTGGTGGCGA TTCTCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

7 01 TGAAGGCGGG GC AG GAT ATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

7 51 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC T AC AAC AG C A GTACGGATAC GGCTTCCAAT GCGGTCTACT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGCTGAT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAGGGCG TTTGTGCGCG 

1001 TGTTGGGTGC AGACGGCAAG GCGGCGGAAC GCGAAATCCG GACCGGTATG 

1051 AGAGACAGTA TGAATACCGA AGTAAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 770>: 



1 MAKMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLNT 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATAK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKVVI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 



30 40 50 60 70 80 

orf85a.pep PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I II I I I I I I I I I 1 1 M M I I I 1 I 11 I I I 
orf 85-1 VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 



90 100 110 120 130 140 

orf 85a . pep INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 

1 I 1 1 11 1 M 1 1 I 1 I I 1 I 1 M I 1 I I 1 I t I 1 I I I M 1 I I 11 I 1 1 I I 1 I : : 11 : 1 I li M I I 1 
orf 8 5-1 INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 



150 160 170 180 190 200 

orf 85a . pep ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
I : M ( M M M I II M M I II I I I I I I I I II I M I i I I I I I M M 11 I I I I M fl M II I 
orf 85-1 AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 



210 220 230 240 250 260 

orf 8 5a. pep PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I I M I I I M II 1 I I I I I! I I I M M I I ! I 1 11 I II t I! M 1 M I I I I 1 M I I 1 I I M I I I 
orf 85-1 PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 



270 280 290 300 310 320 

O r f 8 5 a . pep GGYNS STDT ASNAVY Y YARS FV PNPDGKLATGMTTQNTVE I DGVKNVL 1 1 PS LTVKNRGG 
I I I I I I I I I I I I M 1 I I t I I I 1 M I M I I I M I I I I I I I I I I I M II I I I I I 11 I I 1 I II 
orf 85-1 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
220 230 240 250 260 270 



330 340 350 360 370 380 

orf 85a . pep RAFVRVLGADGKAAERE I RTGMRDSMNTEVKSGLKEGDKVVI SEITAAEQQE SGERALGG 
: I I 1 II I I I M I If f I M II 1 If I 1 I I II 1 M I I 1 I I I I II I 11 I I I I I II M I If 1 1 M 
orf 8 5-1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 

390 

orf 8 5a. pep PPRRX 
Mill 

orf85-l PPRRX 



Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a.. 
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Homology with a predicted ORF from K gonorrhoeae 

ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from K gonorrhoeae 

1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG 40 

1 | ! M 1 I 1 I 11 I I I I II 1 U t i I IlltU: Ilhltll 
1 MAKMMKW AAV AAV AAAAVWGGW S YLK PE PQAA Y I T E AVRR GD I S RT V SAT 50 



ISFTILSEPDT 250 

I I I I I I I I I I I 

201 TVNAAQSTPTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDT 250 

251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 300 

I I I I I I t M I I I I I I I I I I I I I I 1 I I 1 I M 1 M I I I It I I 1 M 1 ) I M I I 
251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 300 

301 MTTQNTVEI DGVKNVL 1 1 P SLT VKNRGGKAFVRVLGADGKAAERE IRTGM 350 

I I M I I I i I I I M I II : II I I I ! I M 11 I I I I I I I I M I I I I I I 1 1 M I 
301 MT TQNTVE I DGVKN VLL I P S LT VKNRGGKAFVRVLG ADGKAVERE I RTGM 350 

152 RDSMNTEVKSGLKEGDKVVI SE ITAAEQQE SGERALGG P PRR 393 

: I I I I I 1 I I I I I M M I I I I I I I I I I I I 1 1 1 I I I I I I I II 1 I 
351 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 

The complete length ORF85ng nucleotide sequence <SEQ ID 771> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCaac 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAACCGCAG GCTGCTTATA 

101 TTACGGAaac ggTCAGGCGC GGCGATATCA GCCGGACGGT TTCCGCGACG 

151 GgcgAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCTTCGGG 

201 GCAGATTAAA AAGCTTTATG TCAAACTCGG GCAACAGGTC AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCACGC AGACCAACAC GAT C GAT AT G 

301 GAAAAAT CCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

4 01 AG GAT GAT GC GACCTCTAAA GAAGATTTGG AAAGCGCGCA GGATGCGCTT 

4 51 GCCGCCGCCA AAGCCAATGT TGCCGAGTTG AAGGCTTTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA TTTGGGCTAC ACGCGCATTA 

551 CCGCGACGAT GGACGGCACG GTGGTGGCGA TTCCCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTATT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGTTGCT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAAGGCG TTCGTACGCG 

1001 TGTTGGGTGC GGACGGCAAG GCAGTGGAAC GCGAAATCCG GACCGGTATG 

1051 AAAGACAGTA TGAATACCGA AGTGAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ED 772>: 

1 MAKMMKW AAV AAVAAAA VWG GWSYLKPEPQ AAYITEAVR R GD ISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STTQTNTIDM 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATSK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESDLGY TRITATMDGT WAIPVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEI D GVKNVLLIPS LTVKNRGGKA FVRVLGADGK AVEREIRTGM 

351 KDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 

30 40 50 60 70 80 

orf85ng PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I M H I I I I 1 I I I I I I I I I I 11 II I M I 
orf85-l VSVGAQASGQIKILYVKLGQQVKKGDLIAE 



ORF85 
ORF85ng 

ORF85 
ORF85ng 
ORF85 
ORF85ng 
ORF85 
ORF85ng 
ORF85 
ORF85ng 
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10 



20 



30 



10 



90 100 110 120 130 140 

orf85na INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 
9 Mtt:lttl:: I I I I I I I I I ! II I I I I H 1 I I I I I I 1 I I I I I M I — I M 1 M I I I I I! 

orf85-l INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 

150 160 170 180 190 200 

orf85ng ALAAAKANVAELKALIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 
I : I I I I i I I I I II I I I I I I I I I I I I I I I ■' I I I I I I I I I I I I I I M I I M I I I I I I M I I 
orf85-l AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 



15 



20 



25 



30 



210 220 230 240 250 260 

orf85ng PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
U I I M I I I I I I M 1 I M I M I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I ! I I I I I I 
orf85-l PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

orf85ng GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 
I I I I M I I t I I II M I I t I I M I I I I I ! I I I I I I I M I I I I M 1 I II I : I I I I I I I I I I I 
orf85-l GG YN S S T DT AS N AVY Y YARS FV PN P DGKLAT GMTT QNT VE I D GVKN VL I IPS LT VKNRGG 

220 230 240 250 260 270 

330 340 350 360 370 380 

orf85ng KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
I { I I I I I M I M I : I I I I I I M : I I M II I I I I 1 I I t I I I I I I I I I I I I I I M II I II I I 
orf85-l KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



35 



orf 85ng 
orf85-l 



390 
PPRRX 
(MM 
PPRRX 



40 



In addition, ORF85ng shows significant homology to an E.coli membrane fusion protein: 

gi I 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 (412 aa) [Escherichia 
coli] Length = 380 
Score = 193 bits (485), Expect = 2e-48 

Identities = 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 



45 



50 



55 



60 



65 



Query: 


29 


Sbjct : 


41 


Query: 


89 


Sbjct: 


101 


Query: 


149 


Sbjct: 


161 


Query: 


209 


Sbjct: 


221 


Query: 


269 


Sbjct: 


274 


Query: 


329 


Sbjct: 


329 



PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 88 
P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 100 

INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 148 
1+ N I ++ L +A+ A+ L A Y RQ L + A S++ 



I++++ S++TA+++L YTRI A M G V 



+GQTV AAQ 



P 1+ LA++ ML K Q++E D+ +K GQ FT+L +P T + ++ VP 



273 



+ + ++A++YYAR VPNP+G L MT Q +++ VKNVL IP + + G 
-TPEKVNDAIFYYARFEVPNPNGLLRLDMTAQVHIQLTDVKNVLTIPLSALGDPVG 328 



+V L +G+ ERE+ G ++ + E+ GL+ GD+VVI E 
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Based on this analysis, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 

Example 92 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 773>: 

1 ..ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT AC TAT AG AG A CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 

251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

401 AAT AT CGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

451 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; ORF120>: 

1 . . IPA 2MTFERS GNAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 

1 MMKTFKNIFS AAILSAALPC AYA AGLPQSA VLHYSGSYGI PAJMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE VVKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 
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Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF120 shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) from strain A of N. 
meningitidis: 



orf 120 .pep 
orfl20a 

orf 120. pep 
orfl20a 

orf 120. pep 
orfl20a 



10 20 30 

IPATMTFERSGNAYKIVSTIKVPLYNIRFE 
I 1 I I : II I I 1 I I I I I I ! I 1 1 I I I 

SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVFLYNIRFE 
10 20 30 40 50 60 

40 50 60 70 80 90 

SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 

I I M I I I I I f i I I I I i I i I I I i I I i I i ( I I I I i M I I t I : I I I I I I I I I I I I M I 
SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 
70 80 90 100 110 120 

100 110 120 130 140 150 

AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
I I I I I I I I I I I I I M I I I I I I I II I I I I I I I II I M I I I I I I ! I I I II I I I I I II I I I I I 
AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 160 170 180 



160 170 180 

orf 12 0. pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
i I II I I I I I I i I i I I I M 1 I I I i I I I I I I I I M I I 
orf 120a SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

The complete length ORF 120a nucleotide sequence <SEQ ID 777> is: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

4 51 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 778>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE VVKYRVRRGD DAVMYFFAFS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 



10 20 30 40 50 60 

orf 120a. pep MMKT FKN I FS AAI LS AAL PCAYAAG L PX S AVLHYS G S YG I PATXXXXXXXN AXK I VS T I K 
I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I 11 I I I : || I I 1 M I I 

orf 120-1 MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMT FERSGNAYKIVSTIK 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 120a . pep VPLYNIRFESGGTVVGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 
t I I I I I I I I I M I I I I I I II I I I i I I I II I I i I I I I I II I I I It I I I I : I I I I I 1 
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orf 120-1 VPLYNIRFESGGTVVGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 

130 140 150 160 170 180 

5 orf 120a pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGD 

I ) | t I t I I I II M I ! I I I M I I I I I I I I I I I I I I I I I > 1 I I I I I I > M I I 11 1 I I I I I I I 
orf 120-1 DL FT LAWQLAAN DAKLP PGLK I TNGKKL Y S VGGLNKAGTGKY S I GGVETE WKYRVRRG D 

130 140 150 160 170 180 

10 190 200 210 220 

orf 120a. pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
I I 1 I M I I 1 ! I I 1 I I 1 ! I I I 1 1 1 1 I M I 1 I 1 I I I I 1 I I I i 1 ! 1 I 
orf 120-1 D AVMY FFAP S LNN I P AQ I GYT DDGKT YTLKLKSVQ I NGQAAK PX 

190 200 210 220 

15 

Homology with a predicted ORF from N. gonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
N. gonorrhoeae: 

orfl20 pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 30 

20 I I I II I M I I I I I i I i t I I 1 I M I I I I ! t 1 

orfl20ng SAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIKVPLYNIRFE 69 

orf 120 . pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 90 
I I I I I I ! I I I I i : I I : I I I I! I I I I I I I I I I I I I I I II I 1 I i I I I I I I I I I I I I 11 I I I I 
25 orfl20ng SGGTVVGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 12 9 

orf 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

I I I I I I I 1 I 1 I I I 1 I M I I I M I I I I I I I I II I I I i I I I I I I I I I i I i I I I I '• I lllil 
orf 120ng AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDTVTYFFAP 18 9 

30 

orf 120. pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 184 

I I I I M I I I I I M I I I I I I M I I I I 1 I I I I I I I I 
orfl20ng S LNN I PAQ I GYT DDGKT YT LKLKS VQ INGQAAK P 223 

The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 

35 1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAAGGCTACC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC G AC GAT T AAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAATCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTGCCTACT 

40 251 ATAAAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCG TAA CCTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGTCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGCC TGAATAAGGC GGGTACGGGA AAATACAGCA TaggCGGCGT 

45 501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATACGGTAA 

551 CGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAG CTCAAATCGG TGCAGATCAA 

651 CGGACAGGCC GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 780>: 

50 1 MMKTFKNIFS AAILSAALPC AYA ARLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE VVKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

55 In comparison with ORF120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120-1. pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
I I I I I I I II I I I I I I I I II I t I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
orfl20ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
60 10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 120-1 vev VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
' y Ml) M | IMl M I I II I ill: M:tiltll ill II II III Mil Mill II! I Mill I 

orfl20na vPLYNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
° y 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 120-1 pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
1 I | I 1 I I 1 I I 1 1 M I I I I i I I I I I I I I I II I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 
orfl20nq DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

130 140 150 160 170 180 

190 200 210 220 

orf 120-1. pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
I : I I I I I I I I M i I M i I 1 I I I I i I I M I I I I I I M I I I M I I 
orfl20ng DTVTYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 

This analysis, including the presence of a putative leader sequence in the gonococcal protein 
suggests that the proteins from N* meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 93 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 78 1>: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 . GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

4 01 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

4 51 AGGCAGGGCG GCAATATT . . 

This corresponds to the amino acid sequence <SEQ ID 782; ORF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNI.. 

Further work revealed the complete nucleotide sequence <SEQ ID 783>: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

4 51 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

751 CCTTATCTCG GGGCGTTTAC GGGATTGCTG CTTGCCACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 
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901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 784; ORF121-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMS VMVF 5LILLLALLL IIV PMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 
201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMLAG ILVFV 
251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 
301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 
351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following results; 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF121 shows 98.7% identity over a 156aa overlap with an ORF (ORF121a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 121 , pep M YRRKGRG I K P WMGAGXAFAALWfl LV FALGDT LT P FAVAAVLAYVL D P L VE WLQKKG LN R 
M (1 I I i I I I t I I I i I I M I II I I I t I i I I I M I I I I I I I I ( I I I I I I I I M I I I I I I 
orf 121a MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 121. pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
I I I I I M I I I I I I I I I I I II I M I I I I I I 1 I 11 M II I I I I I I I I I I 1 I I 1 I I I M I M I 
orf 12 la ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 

orf 121 . pep E I DQAS I I AWLQAHTGEL SN ALKAW F P VLMRQGGN I 
I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12 la EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 

orf 121a SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

The complete length ORF121a nucleotide sequence <SEQ ID 78 5> is: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG ATGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

4 01 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

4 51 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 GATGCTGATT ATGGGTTTGG TTTACGGCTT GGGGTTGGTG CTGGTCGGGC 

7 01 TGGATTCGGG GTTTGCAATC GGTATGGTTG CCGGTATTTT GGTTTTTGTT 

751 CCCTATTTGG GCGCGTTTAC AGGACTGCTG CTGGCAACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 
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This encodes a protein having amino acid sequence <SEQ ID 786>: 

1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLOKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

5 151 ROGGNIVS 5I GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGOL LVMLI MGLVYGLGLV LV GLDSGFAI GMVAGILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

10 ORF121a and ORF121-1 show99.2% identity in 356 aa overlap: 

10 20 30 40 50 60 

orfl21a pep MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
M | | I M I II I I f I I I I I I I I I i I I } I I i I i I i i I i I t I I ! I I I I I I I I I I I I I I I I I I 
orf 121-1 MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
15 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12 la. pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
I ft | ( I | I I i t I I I I I I I I I II I t i I I I I I M I I I I I t I I I I I I I ! I I I I M N I I I 1 I I 
20 orf 121-1 ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

h| 130 140 150 160 170 180 

& orf 121a . pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

:~ 25 I I I I I I I 1 1 1 1 I I 1 1 1 i I 1 1 1 1 I III I II I 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 111 I M I I I M I 

w orf 121-1 EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

III 130 140 150 160 170 180 

j7! 190 200 210 220 230 240 

^ 30 orf 121a. pep SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

■ I | I I || I I M I I I I I I II I I i I I I I I i I I I I I I I I I I II I I I I I M I I I I I I I I 1 I II 1 I 

S3- orf 121-1 SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

LJ 35 250 260 270 280 290 300 

orf 121a . pep GMVAG I L V FV P Y LG A FTG L L L AT VAALLQ FG S WNG I LAVW AV FAVGQ FLE S F F I T PK I VG 
I | : i II I M I I I I I I I i II I I I I I I i I I I I I I I I I I I : I I I I I I M I I I I II I I I II I I I 
orf 121-1 GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

=y 250 260 270 280 290 300 

40 

310 320 330 340 350 

orf 121a . pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I I I I I I 11 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 121-1 DRIGLSPFWV I FSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKY FAGS FYRGRX 

45 310 320 330 340 350 

Homology with a predicted ORF from N. gonorrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (ORF121ng) from 
N. gonorrhoeae: 

50 orf 121 .pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

I I I I I I I I I I 1 I I II I I I I I I I II I: I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
orfl21ng MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

orf 121 .pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

55 M I I I I I I I I I 1 M I I I || 1 1 I 1 1 I I I I I I M I I I I I I M II I I I I I I I I I I I I I I 1 1 I I 

orfl21ng ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

orf 121 .pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNI 156 
I I I I I I I I I I : I I II I I I I I I I I I I I I I I I : I I I I I 
60 orfl21ng EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 
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An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 F.WLOKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KOGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 

Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 

1 ATGTATCGGA GAAAAGGACG GGGCATCAAG CCGTGGATGG GTGCCGGCGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTA CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AATAATTTGG CATCTCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG TTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

4 51 AAACAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCCGCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TCGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACG 

601 GGTAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 

651 GATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGATG CTAGTCGGAC 

701 TGGATTCGGG ATTTGCCATC GGTATGGTTG CCGGTATTTT GGTGTTTGTC 

7 51 CCCTATTTGG GTGCGTTTAC GGGATTGCTG CTTGCCACTG TTGCAGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTCGGTCA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATTGTAGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGAGAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 790; ORF121ng-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 

ORF121ng-l and ORF121-1 show 97.5% identity in 356 aa overlap: 

10 20 30 40 50 60 

MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 



10 20 30 40 50 60 

70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
S I I I I I I I i I I ( I I II I I I I I i t I I I I i I ! I II I I I I I I I I I I I i I I t I I I I I I I I I II I 
ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 160 170 180 

EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
I I I I I I I I I I : II I I I I I I I I I I I I I I I i I : II I I I I I I II I I I I I I I I I I I 1 I I I I 1 I 
EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 

130 140 150 160 170 180 

190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 



orf 121-1. pep 
orf 121ng-l 

orfl21-l.pep 
orfl21ng-l 

orf 121-1 .pep 
orfl21ng-l 

orf 121-1 .pep 
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10 



20 



40 



1 1 1 1 1 1 1 i 1 1 1 1 ( 1 1 1 1 n 1 1 m i ii 1 1 1 1 1 i 1 1 1 1 1 1 1 i 1 1 1 m 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 

orfl21na~l SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 
9 190 200 210 220 230 240 

250 260 270 280 290 300 

orfl21-l pep GMLAGILVFVPYLGAFTGLLIATVAALLQFGSWNGILSVWAVFAVGQFLESFFITFKIVG 
||:MMIItlllllllltlll!MI!!llllllll|:!IMIlllllilMtllillM 
orfl21ng-l GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGIIAVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 



310 320 330 340 350 

orf 121-1 pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
) 1 | i | | i | I I II I I I I I I : I I M I t I I I I M I I I I I t I t I 1 t I : I t I II I I 1 I t I I I 
orfl21ng-l DRIGLSPFWVIFSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 
15 310 320 330 340 350 

In addition, ORF121ng-l shows homology to a permease from H. influenzae: 

sp|P43969 (PERM_HAEIN PUTATIVE PERMEASE PERM HOMOLOG Length = 349 
Score =69.9 bits (168), Expect = 2e-ll 

Identities = 67/317 (21%), Positives = 120/317 (37%), Gaps = 7/317 (2%) 



Query: 26 V YALG DT LT P FAVAAVLAYVL D PLVEWL - QKKGLNRAS ASM S VMV F SXXXXXXXXXXXV P 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct: 32 IYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 



25 Query: 85 MLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYVE- IDQAS I I AWFQAHTGELSNALK 14 3 

ML Q +L S LP + N WL N YEID+++F+ ++ + 

Sbjct: 92 MLWNQTI SLLS DLPAMF NKSNEWLLNLPKNYPELIDYSMVDSIFNSVREKILGFGE 14 7 

Query: 144 AWFPVLMKQGGN I VS S IGNXXXXXXXXXXXXXDWQRWSCG I AKLVPRRFAGAYTRITGNL 203 

30 + + + N+VS D G+++ + P+ A+ R + 

Sbjct: 148 SAVKLSLASIMNLVSLGIYAFLVPLMMFFMLKDKSELLQGVSRFLPKNRNLAFXRWK-EM 206 

Query: 204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 263 
+■ + ++ G+ + + G+ V VPY 

35 Sbjct: 207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 

Query: 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+ ++LP +1 S++ FG L GF 

Sbjct: 267 LVALFQFGISPTFWYIIIAFAVSQLLDGNLLVPYLFSEAVNLHPLIIIISVLIFGGLWGF 326 



Query: 324 VGMLAG L P LAAVT LVLL 340 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAI PLATLVKAVI 343 



Based on this analysis, including the presence of a putative leader sequence and transmembrane 
45 domains in the two proteins, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 94 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 79 1>: 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 



. ACTGCTTTTT 
TTTGTCCTTT 
TTTGCACGTC 
CTGCGCCTCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 



CGGCGGCGCT 
GGGAAACCGT 
CTGCCCGCCG 
ATGCCTTCCA 
GATGTTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGGAAATG 



GCGCTTGAGT 
ATCAACAAAC 
CGTTCAAATG 
TCCGCCCGAG 
CACGAAATGT 
AATGTGCGGC 
TGACCGCCTG 
AGGACGCGGC 
GCTGCCGATA 



CCATCATGAC 
AGCCGCCATC 
CGTACCAGCA 
ATAGCCGAGT 
CTATGCCCAA 
GCGAGTGTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 



TCGTCATATT 
TTAACATTTT 
ATACCGCCGC 
TTTTCGTTGG 
ATCGGCGGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
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4 51 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 
501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG . . 

This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 

1 . .TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 793>: 



10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATATCGTACT 
GCCTTTGATT 
TGGTCGAGCC 
ACTGCTTTTT 
TTTGTCCTTT 
TTTGCACGTC 
CTGCGCCTCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCGCG 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTGCCGC 
GGTACCGATG 
CGGCGGCGAT 
GGGAAACCGT 
CTGCCCGCCG 
ATGCCTTCCA 
GATGTTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGGAAATG 
TCGGTAACGG 
TTTTTCAAAT 
TCAGTTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TCTTACCCAA 
•CCGATATATT 
GCGCTTGAGT 
ATCAACAAAC 
CGTTCAAATG 
TCCGCCCGAG 
CACGAAATGT 
AATGTGCGGC 
TGACCGCCTG 
AGGACGCGGC 
GCTGCCGATA 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAAG 
GGCTTCGATG 
CATTTTCGGG 
TCGTCTTGTG 
AGCCGCCATC 
CGTACCAGCA 
ATAGCCGAGT 
CTATGCCCAA 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
CGCATCGGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TAGATACCGC 
AAAAAGTTGA 
TACGAATTCG 
TCGTCATATT 
TTAACATTTT 
ATACCGCCGC 
TTTTCGTTGG 
ATCGGCGGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCGGAGTGTC 
TATCAGCTTT 
TACGGATGTT 



25 This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 



30 



35 



l 

51 
101 
151 
201 
251 



ISYWASSSPD 
TAFSAAMRLS 



FLEVDTAPLI 
SSCWIFLSF 



LRLYAFHPPE 
NHGRIDIDRL 
EQRVGNGVQQ 
RHRLCS* 



IAEFFVGFAF 
PTLRLNALIR 
RIGIGVSEQP 



FLPLLPKASM 
GKPYQQTAAI 
DVDARNVYAQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFCTSCPP 
IGGDVGTHLR 
FELCGGVGEM 
YQLSAFGQLV 



PIYSFSGTNS 
RSNAYQQYRR 
NVRREFGFLC 
AADIAQTCRT 
DIVALSDTDV 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF122 shows 94.0% identity over a 182aa overlap with an ORF (ORF122a) from strain A of TV. 
meningitidis: 



40 



45 



50 



55 



10 20 30 

orfl22.pep TAFSAALRLS PSXLVI FLS FGKP YQQTAAI 

I I M I ! : II I I : I I I I I II f I II I I I I I 
orfl22a FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 
30 40 50 60 70 80 

40 50 60 70 80 90 

orf 122 . pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 
MM M M II M II I M M M M M M M M M M II M II M I M M M M M M 
orf 122a LT F FXT SCPPRSNP YQQ YRRLRL Y AFH APE I TE FFVG FAFXVD ARN VY AQ I GGDVGT HLR 

90 100 110 120 130 140 

100 110 120 130 140 150 

orf 122 . pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 
Mill II II M II I I I I I II M I I II M M I M I M M I I I I I I II I I I M I M M II I 
orf 122a NMRREFGFLCNHGRI DI DRLPTLRLNALIRRTQKDAAVRI FELCGGVGEMAADIAQTCRT 

150 160 170 180 190 200 

160 170 180 

orf 122 .pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
II I M M I M I I II I II M I I M I M II I II I 
orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 
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210 220 230 240 250 

The complete length ORF122a nucleotide sequence <SEQ ID 795> is: 

1 AT AT CAT ATT GGGCAAGCAG TTCACTGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAACC GGTACCGATG CCGATGTATT CGTTTTCGGG TACGAATTCG 

151 ACTGCNTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTNNNACGTC CTGCCCGCCG CGTTCAAATC CTTACCAGCA ATACCGCCGC 

301 CTGCGACTCT ATGCCTTCCA TGCGCCCGAG ATAACCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GANGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATATGCGGC GCGAGTTTGG GTTTCTGTGC 

4 51 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT .TGTGTTCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 796>: 

1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 

101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 

10 20 30 40 50 60 

ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
I I I I M I I I I II I I I I I I I I I I I I I I I I I I I I II II I II I : I I I M I I I I I I I I I I I I I 
ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 
10 20 30 40 50 60 

70 80 90 100 110 120 

S S C VV I FL S FGK P YQQTAAI LTFFXT SCPPRSN P YQQYRRLRL YAFHAPE ITEFFVG FAF 
I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I : I I I I I II I 
SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
70 80 90 100 110 120 

130 140 150 160 170 180 

XVDARNVYAQ I GG DVGTHLRNMRRE FG FLCN HGR I DI DRL PT LRLNAL I RRTQKDAAVR I 
I I I I I I I I II I I I I I I I I I I : I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I 
DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
130 140 150 160 170 180 

190 200 210 220 230 240 

FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
I I I II II I II I I I I I I I I I I II I I I I II I I I I I II I I II I I I I I I I I I I I II I I I I M II 
FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
190 200 210 220 230 240 

250 

DIVALS DT DVRHRLC SX 
1 I I I I II II I I II 1 I I I 
D I VAL S DT DVRHRLC SX 



Homology with a predicted QRF from N. gonorrhoeae 

ORF122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 
N. gonorrhoeae: 



orf 122a. pep 
orfl22-l 

orf 122a. pep 
orfl22-l 

orfl22a.pep 
orfl22-l 

orf 122a. pep 
orfl22-l 

orf 122a. pep 
orfl22-l 
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10 



15 



orf 122 .pep 
orf 122ng 
orf 122 .pep 
orf 122ng 
orf 122 .pep 
orf 122ng 
orf 122 .pep 
orf 122ng 



TAFSAALRLS PSXLVI FLS FGKPYQQTAAI 
I | | i I I : I I I i : t I I I I I I I I t I I 1 I I I 
FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 

LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 

| | | | | | | | | | | | I ! I 11 I I 1 II f i I I I I ! ! 1 I M I I I I t : I 1 ! I : I I I I I I I I I I I 
LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR 

NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 

Ml | | M | M 1 I | | I I : I I I I 1 I I 11 I 1 I Ml i I I I I I M I I i til • i i t i : I I H I I 
NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 



30 



80 



90 



140 



150 



200 



EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
I I M I I I I 1 I I : I I : 1 1 I N I M I I I 1 I I I 

EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 



182 



256 



The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtccttt 
TTTGCACGtc 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ ID 798>: 



35 



l 

51 
101 
151 
201 
251 



MSYRASSSPD 
TAFSAAMRLS 



FLEVETAPLI 
SSCVVIFLSF 



LRLYAFHPPE 
NHGRIDIDHL 
EQRVGNGVQQ 
RHRLCS* 



IAEFFVGFAF 
PTLRLNALIR 
RVGIRMPEQP 



FLPLLPKASM 
GKPYQQTAAI 
DIDARNIDTQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVE PVPM 
LTFFCTSWPP 
IGGDVGTHLR 
FELCGGVGKM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYQQYRR 
NVRCEFGFLC 
AADVAQTCRT 
DIVALSDTDI 



40 ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 



45 



50 



10 20 30 40 50 60 

orf 122-1. pep IS YWAS SS PDFLEVDTAPLI FLPLLPKASMKKLMVEPVPMPI YS FSGTNSTAFSAAMRLS 
: | I ( I I I I i 1 I t [ : t i I I i t I I I t I I t I I I I I I I I I I t i I : I I I I I I I I I I 1 I I t I M I 
orfl22ng MS YRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYS FSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122-1. pep SSCVVI FLS FGKPYQQTAAI LTFFCTSCPPRSNAYQQYRRLRLYAFHPPE IAEFFVGFAF 
M I I 1 I I I I t I I i ! M I I I I I ! I I I I I I I I I I t I I ! I I I I I I I I 1 I I I I I I I I I I I I ! 
orf 122ng SSCWI FLS FGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPE IAEFFVGFAF 

70 80 90 100 110 120 



55 



130 140 150 160 170 180 

orf 122-1, pep D V DARNVYAQ I GG D VGTHLRN VRRE FG FLCNHGRI D I DRL PT LRLNAL I RRTQKDAAVR I 
I : ( I I I : : I I I I I I i I I I I I I I I i I II I I I I I t I I I : I I I I I I I I I I I I I I I I I I I I t 
orfl22ng DIDARNIDTQIGGDVGTHLRNVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRI 
130 140 150 160 170 180 



60 



190 200 210 220 230 240 

orf 122-1. pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
I i I I I M I : I I I I : I M II i M I I i I ( t I I I : I I : I I I I II II I I I I I i I I I I I I I I I 
orfl22ng FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 
190 200 210 220 230 240 
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250 

orf 122-1 . pep DIVALSDTDVRHRLCSX 
I I I II I I I I : I II I ! 1 I 
5 orfl22ng DIVALSDTDIRHRLCSX 

250 

Based on this analysis, it is predicted that the proteins from N. meningitidis and Kgonorrhoeae, arid 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 95 

10 The following partial DNA sequence was identified in A ^meningitidis <SEQ ID 799>: 

1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGGGGCGGA TTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

1 5 This corresponds to the amino acid sequence <SEQ ED 800; ORF125>: 

1 . .AGASANNISA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 
51 MGGFDCRLFR LETA* 



Further work revealed the complete nucleotide sequence <SEQ ID 801 



>■ 



1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

20 51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

25 301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

30 551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 

35 801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

851 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

40 1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 

45 1 MSGNASSPSS SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGAL FFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 

101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMFLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETD VAKIL 

50 251 LGAGLGAAGI LAWL STVTT TFL DAY SAGA SANNISARFA E TPVAVGVTL 

301 IGTVLAVMLP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEGFDF 

351 AGLVLWLAGF ILYRFLL SSG WESSIGLT AP VMSAVAIATV SVRLFF KKTQ 

4 01 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A of AT. 



meningitidis: 



orf 125. pep 
orfl25a 

orf 125. pep 
orfl25a 



10 20 30 

AGASANN I SAR FAET P VAV S VT L I GTVLAV 
I I : I I I I I I I : : : I I : I I : I : : : II : M I 
KILLGAGLGAAGILAWLSTVTTTFLDAYSAGVSANNISAKLSEIPIAVAVAWGTLLAV 
250 260 270 280 290 300 

40 50 60 

MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 
: I I I M I I I I I II I I I M I M : 

LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
310 320 330 340 



The ORF125a partial nucleotide sequence <SEQ ID 803> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGTCGGGCA 
TTGGTTCGGC 
TTGCGCCTTT 
GCCGTCGGCG 
CGGACNCANC 
CAGTGCTGTT 
GTGATGATTT 
GTGGGACGGC 
TTGTGCTGTG 
GTTTCGATGC 
NTTTTCCACG 
TCGGAACGGC 
CTGGCCGCCG 
GACGGCAACG 
GTTTGGCAGC 
CTGGGCGCAG 
CGTTACCACC 
ATATTTCCGC 
GTCGGCACAC 
CCTGCTGCTT 
CCGACTTTTT 



ATGCCTCCTC 
GCGGCGGTAT 
GGGCTGGCAG 
GCGCGCTGTT 
TCGATGGAAA 
TTCCGTGGCG 
ACGCCGGCGC 
GAATCTTTTG 
GCTGGTTTTC 
TGCTGATGCT 
GCAGGCAGCA 
AGTCGAGCTG 
ACTACACGCG 
CTCGCCTACA 
GGCGTTGTTC 
GTTTGGGTGC 
ACTTTTCTCG 
CAAACTTTCG 
TGCTTGCCGT 
ATCGGCTCGG 
CGTCTTGAAA 



TCNTTCATCT 
CGATTGCCGA 
CGCGGTCTGG 
TTTTGCGGCG 
GCGTGCGCCT 
AATATGCTGC 
AACGGTCAGC 
TCTGGTGGGC 
GGCGCACGCA 
GTTGGCGGTT 
CCGCCGCANN 
TCCGCCGTNA 
CCACGCGCGC 
CGCTGACCGG 
ACCGGAGAAA 
GGCAGGCATT 
ATGCNTACTC 
GAAATACCNA 
CCTCCTGCCC 
TATTTGCGCC 
CGGCGTGAGG 



TCCGCCGCCA 
AATCAGCACG 
CNGCTCTGCT 
GCGTATATCG 
GTCGTTCGGC 
AACTGGCCGG 
TCCGCTTTGG 
ATTGGCAAAC 
AAACAGGCGG 
CTGTGGCTGA 
GGTNNCAGAC 
TGCCGCTTTC 
CGCCCGTTTG 
CTGCTGGATG 
CCGACGTGGC 
TTGGCGGTCG 
CGCCGGCGTA 
TCGCCGTTGC 
GTTACCGAAT 
GATGGCGGCG 
AGATTGAAGG 



TCGGGCTGAT 
GGTACACTGC 
TTTGGGTCAT 
GCGCACTGAC 
AAACGCGGTT 
CTGGACGGCG 
GCAAAGTGTT 
GGCGCGCTGA 
GCTGAAAACC 
GTGCCGAANT 
GGCATGAGTT 
TTGGCTGCCG 
CGGCAACCCT 
TATGCCTTGG 
AAAAATCCTG 
TCCTGTCGAC 
AGTGCCAACA 
CGTCGCCGTT 
ATGAAAACTT 
GTTTTGATTG 
C. . 



This encodes a protein having the partial amino acid sequence <SEQ ID 804>: 



l 

51 
101 
151 
201 
251 
301 



MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



AVGGALFFAA AYIGALTGXX 
VMIYAGATVS SALGKVLWDG 
VS MLLMLLAV LWLSAEXF ST 
LAADYTRHAR RPFAATLTAT 
LGAGLGAAGI LAWL STVTT 
VGTLLAVLLP VTEYENFLLL 



SMESVRLSFG KRGSVLFSVA 
ES FVWWALAN GALIVLWLV F 
AGSTAAXVXD GMSFGTAVEL 
LAYTLTGCWM YALGLAAALF 
TFLDAYSAGV SANNISAKLS 
I GS VFAPMAA VLIADFFVLK 



ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 



NMLQLAGWTA 
GARKTGGLKT 
SAVMPLSWLP 
TGETD VAKIL 
E IPIAVAVAV 
RREEIEG. . 



10 20 30 40 50 60 

orf 125a. pep MSGNAS S XS S SAAIGLI WFGAAVS I AE I STGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I I I It I I I I : M I I ( I I I I I i I I I t I I I I II ! I I I I M I It I i I I t I t t i ( I i I I I ( I 
orf 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 125a . pep AYIGALTGXXSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I II I I M I M I I I I I I I I I I I I I I 
orf 125-1 AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 

70 80 90 100 110 120 



orf 125a. pep 



130 140 150 160 170 180 

ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 
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M I ! | | | | ) II I I I I I I 1 1 I II I I I ! I I I I II II I I I I I I i N I i I IMHMII I I 
orf 125-1 esfVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 
° rt 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 125a pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 
M I ! I I M II N I I I N N I N I M N N M M II M 1 I I I M I i I I I I N II I I M I I I 
orf 125-1 GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 125a pep T GE T D VAK I L LG AG L G AAG I LA WLS T VT T T F L DAY S AG V S ANN I S AKL SE I P I AV AV AV 
M I I I M II M M 1 M I II M 1 M II I I I ( ( I I I 1 M t I : I i t I t 1 I : : I MINI:: 
orf 125-1' TG E T D V AK I L LG AG L GAAG I LAW L S T VT T T FLD A Y SAGAS ANN I S AR FAE T P VAVG VT L 

250 260 270 280 290 300 

310 320 330 340 

orf 125a pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
: M : I 11 : 1 I M II N I N M I M II N N II N M II N I I I N I I 
orf 125-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 
310 320 330 340 350 360 



Homology with a predicted ORF from N.gonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N. gonorrhoea e: 

orf 125 . pep AG AS ANN I SARFAET PV AV S VT LI GT VLAV 30 

orf 125ng 
orf 125 . pep 
orf 125ng 

An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 



AG AS ANN I SARFAET PV AV S VT LI GT VLAV 
II I M II I I I II M M M : M M Mill 
KI LLGAGLGITG I LAVVLST VTTTFLDT Y SAGAS ANN I SARFAEI PVAVGVTLI RTVLAV 308 

ML P VTE YEN FLLLIG SV FAPM- GGFDCRL FRLET A 6 4 
I I i M I I :\ M M I (MM) I II II N I I : M 
MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 34 3 



1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AEIPVAVGVT 

301 LIRTVLAVM L PVTEYKNFLL LIRSVFGPMA GGFDCRLFCL KTA* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 



1 ATGTCGGGCA ATGCCTCCTC 

51 TTGGTTCGGC GCGGCGGTAT 

101 TCGCCCCCTT GGGCTGGCAG 

151 GCCGTCGGCG GCGCGCTGTT 

201 CGGACGCAGC TCGATGGAAA 

251 CAGTGCTGTT TTCCGTGGCG 

301 GTGATGATTT ACGTCGGCGC 

351 GTGGGACGGC GAATCCTTTG 

401 TCGTGCTGTG GCTGGTTTTC 

4 51 GTTTCGATGC TGCTGATGCT 

501 GTTCGCTTCG TCCGGCACAA 

551 CCTTCGGAAC GGCAGTCGAA 

601 CCGCTGGCCG CCGACTACAC 

651 CCTGACGGCA ACGCTCGCCT 

7 01 TGGGTTTGGC GGCGGCTCTG 

7 51 CTGTTGGGCG CGGGCTTGGG 

801 CACCGTTACC ACAACGTTTC 

851 ACAACATTTC CGCGCGTTTT 



TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 
CGATTGCCGA AATCAGCACG GGTACGCTGC 
CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 
TTTTGCGGCG GCGTATATCG GCGCACTGAC 
GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 
AATATGCTGC AACTGGCCGG CTGGACGGCG 
AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 
TCTGGTGGGC ATTGGCAAAC GGCGCACTGA 
GGCGCACGCA GAACGGGCGG GCTGAAAACC 
GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 
ACGCCGCGCC CGCCGTTTCA GACGGCATGA 
CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 
GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 
ATACGCTGAC GGGCTGCTGG ATGTATGCCT 
TTTACCGGAG AAACCGACGT GGCGAAAATC 
CATAACGGGC ATTCTGGCAG TCGTCCTCTC 
TCGATACCTA TTCCGCCGGC GCGAGTGCGA 
GCGGAAATAC CCGTCGCTGT CGGCGTTACC 
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901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACGGCGTG AGGAGATTGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES EVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AE IPVAVGVT 

301 LIGTVLAVM L PVTEYKN FLL LIGSVFAPMA AVLIA DFFVL KRREEIEGFD 

351 F AGLVLWLAG F1LYRFLL SS GWESSIGLTA PVMSAVAIAT VSVRLFF KKT 

4 01 QSLQRNPS* 

ORF125ng-l and ORF125-1 show 95,1% identity in 408 aa overlap: 

10 20 30 40 50 60 

MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
! 1 I I I I I I I I I : M M : ! I M I ! I I I I I M I I I I II I I M M I ! M 11 1 I I 1 M II I I I ! 
MSGNASSPSSSAAIGLVWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
10 20 30 40 50 60 



orf 125-1 .pep 
orfl25ng-l 



70 80 90 100 110 120 

orf 125-1 . pep AYIGALTGRS SMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
I II M I M I I I I I I M I I I II I I I I H I II I I I I I I I I M II I : II I II I I I M M I I I 
orfl25ng-l AYIGALTGRSSMESVRLSFGRCGSVLFSVANMLQLAGWTAVMIYVGATVSSALGKVLWDG 

70 80 90 100 110 120 



130 140 150 160 170 179 

orf 125-1 . pep ESFVWWALANGALIVLVJLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I ! I I I M II I II I II I I I I I I I •* I I I I I I I I I M I II I I I I I I : II I ::: I :: M II 
orf 125ng-l ESFVWWALANGALIVLWLVFGARRTGGLKTVSMLLMLLAVLWLSVEVFAS SGTNAAPAVS 

130 140 150 160 170 180 



180 190 200 210 220 230 239 

orf 125- 1 . pep DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 
1 | I : I I I M M 1 1 I I I ! I II I M I I I I I : I I 1 1 I I I M II I M II I I I I I II 1 I I I I I I I 
orfl25ng-l DGMT FGTAVE L S AVMPL S WL PLAAD YTRQARRP FAATLTAT LAYTLTGCWMYALGLAAAL 

190 200 210 220 230 240 



240 250 260 270 280 290 299 

orf 125-1 . pep FTGETDVAKILLGAGLGAAGILAWLSTVTTTFLDAYSAGASANNISARFAETPVAVGVT 
M I I I I I I II M I II I I : I I I I I II I I I I I II I I : I I II I I I I I I 1 I I I I I (MINI 
orf 125ng-l FTGETDVAKILLGAGLGITGILAVVLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVT 

250 260 270 280 290 300 



300 310 320 330 340 350 359 

orf 12 5-1. pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 
I 1 I I I I I i I I M II I : I II I I I I I I I I II I II I I I I I I M I M I II I I I I II I I I I I I M 
orfl25ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

310 320 330 340 350 360 



360 370 380 390 400 

orf 125-1 . pep F I LYRFLLS S GWE SSI GLTAPVM SAVAI AT VSVRLFFKKT QSLQRN P SX 
I I I II II I I I I I I I I I I I M I I I It I I I I I I I I M I I I I I M I I I M I I 
orfl25ng-l FILYRFLLSSGWESSIGLTAPVM SAVAI ATVSVRLFFKKTQSLQRNPSX 

370 380 390 400 



Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
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N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 96 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 809>: 

5 1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A . ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

10 251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT.ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC.CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

15 501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG . . 

This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

20 151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 81 1>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

25 151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

30 401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GGCCTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 

35 651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGTC 

7 01 TGCTCCATCC GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTGCGTTCA GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

40 901 CTCAACCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGCG ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

45 This corresponds to the amino acid sequence <SEQ ID 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

50 201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAARL AVALF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted QRF from N. meningitidis (strain A) 

ORF126 shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

or f 12 6 pep MTR I AI LGGGL S GRLT ALQLAEQG YQ I AL FDKS CRRGEHAAAYVAAAMLAPAAXT VEAT P 
i M t I t t I t M I I I M I t M M I 11 I I ! t I M : II M 1 M 11 I I I M 1 I I 11 I :IIHI 
orfl26a MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 12 6 pep EVVRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 
111(1111 1111(1(11:1:1 : I I I I I I I I I II I I I I I I I : I I I I II I I I I : I I I 
orfl2 6a EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12 6 pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 
I I I I I I I M I I I I I I I I II I I I I I It I II I I I I : I I II I I I I I M I I I I I I I I I = I I 
orf 126a VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 140 150 160 170 180 

The complete length ORF126a nucleotide sequence <SEQ ID 81 3> is: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

201 GC AG AN CATC CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 

251 CCATGATGCA NGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAA 

301 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

7 01 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

801 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 

1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF PKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EVVRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLI DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGFM I5 PAVTAAAVRL AVALF DGKXA 

351 PERDEESGLA YIRRQD* 

ORF 126a and ORF 126-1 show 95.4% identity in 366 aa overlap: 



10 20 30 40 50 60 

orf 12 6a. pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I II I I I I I I I M i I I I I i I I I I I I I I I II I I M (I I I I I I I I M i t I I I I I I I I I I I M I 
orf 126-1 MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 
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10 



15 



20 



25 



30 



35 



orfl26a.pep 
orfl26-l 

orf 126a . pep 
orfl26-l 

orf 126a. pep 
orfl26-l 

orf 126a . pep 
orfl26-l 

orf 126a . pep 
orfl26-l 

orf 126a. pep 
orf 126-1 



70 80 90 100 110 120 

EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

I 11 II II I I I II I I M I : I : I : I I I I I I I I I II I I I I I I I : I I I I I I I I 1 I I I I I I 
EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
70 80 90 100 110 120 

130 140 150 160 170 180 

VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

I I I I I I I I I I I I I I I I I I I I I I 1 I I I II I I I I I I I I I II I I I I I I I I I I I I I M I I I : I I 
VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
130 140 150 160 170 180 

190 200 210 220 230 240 

DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I II I I II 1 I I I I I I I I I I I I I I I I I I 
GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
190 200 210 220 230 240 

250 260 270 280 290 300 

LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 
I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I ) I I M : I I I I I I I I I I I II I I I I I I 
LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
250 260 270 280 290 300 

310 320 330 340 350 360 

LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 
I I M I! I I I I II I I I I I I I I I II I II I I I I I I M I I I : M I I I I I I I I 11111:11111 
LNHHN PE I RYNRARRL IE INGLFRHG FMI S PAVT AAAARLAVALFDGKDAPERDKE SGLA 

310 320 330 340 350 360 



YIRRQDX 
I I I I I I I 
YIRRQDX 



40 



45 



50 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N. gonorrhoeae: 

orf 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 

I i I M : II I I I I I I I II I II I I I II I i 111): I : I II I I I I I I I M I I I I I : I I I I 1 
orfl26ng MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

orf 12 6. pep E WRLGRQS I PLWRG I RCRLNTHTMMQENG S L I VWHGQDKPL S SE FVRHLKRGGXT DDE I 120 

I I : I I II I II I I I I I I II M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I 
orfl26ng EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 120 

orf 126 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 180 

M II I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I II I I I I II I I II I I I I I M : I : 
orfl2 6ng VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 180 

An ORF126ng nucleotide sequence <SEQ ID 815>*was predicted to encode a protein having amino 
acid sequence <SEQ ID 816>: 



55 



l 

51 
101 
151 
201 
251 
301 
351 



MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 



PAAEAVEATP 
PLSSEFVRHL 
LDGRQILSAL 
WNQSPEHTST 
SSSARPKSKA 
LNHHNPEIRY 
PERDEESGLA 



EVIRLGRQSI 
KRGGVADDEI 
ADALDELNVP 
LRGIRGEVRG 
KAKPPPAYVP 
SRERRLIEIN 
YIGRQD* 



PLWRGIRCRL 
VRWRADEIAE 
CHWEHECAPQ 
FTRPKSRSTA 
GWNSYPRSMP 
GLFRHGFMIS 



NTLTMMQENG 
REPQLGGRFS 
DLQAQYDWVI 
PCACCTRAIR 
STPPSAKPTS 
PAVTAAAVRL 



SLIVWHGQDK 
DGIYLPTEGQ 
DCRGYGAKTA 
STSPRKKTTS 
SKWRPGLRPT 
AVALFDGKDA 



60 Further work revealed the following gonococcal DNA sequence <SEQ ID 817>: 
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1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 

51 ATTGCAGCTT GCAGAACAAG GTTATCAGAT TGAACTTTTC GACAAGGGCA 

101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA GGCAACGCCC GAAGTCATCA GGCTGGGCAG 

201 GCAGAGCATT CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCTCA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGATGA AATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCT TGCCATTGGG AACACGAATG CGCCCCCCAA GACCTGCAAG 

551 CCCAATACGA CTGGGTAATC GACTGCCGGG GCTACGGCGC GAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC TTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACGC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

7 01 TGCTGCACCC GCGCTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTACGTTCC GGGCTGGAAC TCTTATCCGC GCTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCGCCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGCTAC AGCCGCGAAC GCCGCCTCAT 

951 CGAAATCAAC GGCCTTTTCC GGCACGGCTT TATGATTTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGTG ATGAAGAAAG CGGTTTGGCG TATATCGGAA GACAAGATTA 

1101 A 

This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEEVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIAAGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 

10 20 30 40 50 60 

orf 126-1 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I I I I I : I I i I II 1 I I II I I II I ! I I M I I I I I I : I I I I M I! M ! I I 1 M M II 1 II I 
orfl26ng-l MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 ' 110 120 

orfl26-l .pep EVVRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
M : II M M I I I ! I M I I I I I I I I I I I I I M I I I I I I I I I II I I I I I I I I I I I I I I I I I 
orfl26ng-l EVIRLGRQS I PLWRG IRCRLNTLTMMQENGS L I VWHGQDKPLS SE FVRHLKRGGVADDE I 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12 6-1 . pep VRWRADDIAEREPQLGGRFSDGI YLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
I I I I i I : I II I I I I 11 I I I II I I I I II I I I II II I I I I I I I I II I I M i II II I M i : I : 
orfl26ng-l VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 126-1. pep GLQAQYDWLIDCRGYGAKTAWNQS PEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
II M I I I : I I I I 11 I I I I I I I I I I M I I M I I I I I I I I I I I M I 1 II I I II I I M M I I 
orfl26ng-l DLQAQYDWVI DCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12 6-1 .pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
M I II I I I I I I I II I I I I I I I I II I I I I I I I I II I I I II I : II II I I I I I I I I I : I I I I I 
orfl26ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12 6-1. pep LNHHN PE I RYNRARRLIE INGLFRHGFMI S PAVT AAAARLAVALFDGKDAPERDKESGLA 
I I I I I I I M I : ! I I I I I I I I I I I I I I I I I I 1 I I I I I : M I I I I I I I I I I I I II : I I I I I 
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orfl2 6na-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 

orf 126-1. pep YIRRQDX 
It MM 

orfl26ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

gi 1 2 627321 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli] 
Length = 327 
Score - 169 bits (423), Expect - 3e-41 

Identities = 112/329 (34%), Positives = 163/329 (49%), Gaps « 25/329 (7%) 

Query: 3 RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 

RI V G G++G A QL G+++ L ++ G 
Sbjct : 2 RILVNGAGVAGLTVAWQLYRHGFRVTLAERAGTVGA-GASGFAGGMLAPWCERESAEEPV 60 

Query: 63 IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + G+L+V G+D F R G DE+ 

Sbjct : 61 LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS-GWEWLDEVA- 113 

Query: 123 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 

IA EP L GRF ++■ E LD RQ L+ALA L++ + + 
Sbjct: 114 IAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 165 

Query: 183 QAQ Y DW V I DCRG YGAKT AWNQS PE HT ST LRG I RGE VARVYT PE IT LNR P VRLLH PRY PLY 242 

+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDHDRWDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 218 

Query: 243 IAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 302 

I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 
Sbjct: 219 IV PRDKNRFMVGATMI E S DDGG P ITARS LMELLNAAYAMH PAFGEARVTETGAGVRPAYP 278 

Query: 303 HHNPEIRYSRERRLIEINGLFRHGFMISP 331 

+ P R ++E R +* +NGL4-RHGF+++P 
Sbjct: 27 9 DN L P — RVTQE GRT LH VNGL YRHG FLLAP 305 

This analysis suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 



The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ED 
819>: 



1 AT G ACT GAT A ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

2 01 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG..GCTT TAGACAGTAA ATTCATGTTG 

301 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 

351 TGAAAATCTA GTAACCTTTA j|TTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

4 01 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 

4 51 GTAG 

This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 



1 MTDNRG FTLV ELISVVLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 

51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 

101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 

151 * 
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Further work revealed the following DNA sequence <SEQ ID 82 1>: 

1 AT G ACT GAT A ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLP1KEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF 127a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 

orf 127 . pep MT DNRGFTLVELI SWLI LS VLALI VYPS YRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I ! I I I I I I I I ) I M I i I I I I I j I I I I I I I I I I I I t I I I I I : t M I i I i I I I I I I I I I t 1 I 
orf 127a MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 127 , pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 
! I I I 1 I I 1 I I I I I I f I I 1 I I ) I I I I I I I I I M I I I I I I i I t i M I i I I I I I M i I f I I 
orf 127a GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 

70 80 90 100 110 



130 140 150 

VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
I I I I I I I t t I I I I I I I i ! I I I t I I It I I I I I 
VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
120 130 140 150 

The complete length ORF 127a nucleotide sequence <SEQ ID 823> is: 



orf 127 .pep 
orfl27a 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 



1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 



10 20 30 40 50 60 

orf 127a . pep MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 
I I 1 ! I 1 II I 11 I I) 1 I I I I I I I I I I II I I I I I I ! I I M I 1 :! I I I I I I I I I I I I I I I I ! 1 
orf 127-1 MTDNRG FTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orfl27a pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

I | | | | | M I I M I I I I I I I fl II I I I i ( t If t t I I I I II M t I I II I II M I II If II I I 

orf 127-1 GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 

130 140 150 

orf 127a .pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
I M 1 I I I M I M I I M M It I I I II M I M 
orf 127-1 TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 

Homology with a predicted ORF from N gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 
N. gonorrhoeae 

orf 127 .pep MT DNRG FT L VE LIS W L I L S V L AL I V Y P S YRN YVE KAK I N A VRAAL LE N AH FME K F YLQN 60 

II II M I I I M I I I I I M I I II M I I I I I I II I II I II I I I I I II : I I I I II II I M II I 
orfl27ng MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAAFLENAHFMEKFYLQN 60 

orf 127 .pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 120 

I M I I I I I II I II M I II M I M II I M II II I II I I II II I II M I I I II II I I I M 
orf 127ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

orf 127 .pep VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 

I M M M II M I I I II I I I I II I M I M I 
orfl27ng VTFICKKSASSCSDRLDYFKGNDKDCKLLK 14 9 

The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 

1 AT G ACT GAT A ATCGGGGGTT TACACTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AG AAAAT G C A 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 826>: 

1 MTDNRGFTL V ELISVVLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 

10 20 30 40 50 60 

orf 127-1 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I M I II M I II I II I I II II II II M M I I I I I M I I M I I M M M II I II I M I I I I I 
orf 127ng-l MT DNRG FT L VE L I S VVL I L S VLAL I V YP S YRN YVE KAK I NAVRAALLENAH FME K FYLQN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 127-1. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
I II I I I I I I I M I I I II I I I I II I I I I I II I II I M I M I I M I I I I I I I II M I I I II I 
orfl27ng-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 

130 140 150 

orf 127-1. pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
M I I I M I I I I II I I I I I I II I II I M I I I 
orfl27ng-l TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 
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This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



5 Example 98 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 827> 

1 . . GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 

51 CAACCAAATG CGGAAAACCC GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 

101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 

10 151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 

201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 

251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 

301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 

351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 

15 401 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 

451 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 

501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 

551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 

601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 

20 651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 

7 01 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG.. 

This corresponds to the amino acid sequence <SEQ ED 828; ORF128>: 



1 . . V5I-ASVTASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 

51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 

25 101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 

151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 

201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 829>: 



1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

30 51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

35 301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AG CAT CAT CC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

40 551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC- ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

7 51 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

45 801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATT AC ATT AC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

50 1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 

1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGAG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

55 1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 
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14 01 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCTGTGCCGA 

14 51 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

17 51 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTCCA CAAACACGAA CGCCTGCTTA AATCTTCCCA 

1851 CGGCGGCGCA TTGCAGTAG 

This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 

1 MQAVRYRPE I DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 GIILSEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA SSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QL GLPAVShV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

4 01 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 

551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKSSHGGA LQ* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical integral membrane protein HI0392 of {[.influenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

Orf 128 : 1 VSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGFQQGYFDLSADENPVLHIWSLAV 60 

++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
HI0392 : 46 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSN FYLGLTQGYFDLSANENPVLHIWSLAV 105 



Orf 128: 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLIFPLILILAYKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLS 165 

Orf 128: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A of N. 
meningitidis; 

10 20 30 

orf 128 .pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 

I I I M I I I 1 II I I I I I M I I i I I I I I I II I 
orf 128a ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 
60 70 80 90 100 110 



40 50 60 70 80 90 

orf 128 . pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
I I I I II I I I I I I I I I I I I I I I I I I II I if I I I I I I I I I M I I II I I I I I I f I I 1 I I M I I 
orf 12 8a LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 



100 110 120 130 140 150 

orf 128 -pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
I I M I I I I : II I I I I M I M II I I II I I I I I I I I I I I I I I M II I II I M M II I M I I I 
orf 128a ILFLILTATSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
180 190 200 210 220 230 
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160 170 180 190 200 210 

orfl28 pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
| M | I! M I II I M I I ! II 1 M I I I 1 M M M I I t I II I I 11 M I I I M I I I II I 11 M I 
orfl28a RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
240 250 260 270 280 290 



10 



220 230 240 

orfl28.pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 
M 1 I 1 I M I M M M M M M I 1 I I I I I I i 
orfl28a VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 
300 310 320 330 340 350 

orfl28a KMTFKKAFFCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSH 
360 370 380 390 400 410 

1 5 The complete length ORF 1 28a nucleotide sequence <SEQ ID 83 1 > is: 



20 



25 



30 



35 



40 



45 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
GGCATCATTC 
TTATAC CCGC 
CGCTGGCTTC 
CAAATGCGGA 
TCTGGGGTTT 
TACTGCATAT 
CCTCTTTTGC 
GCGTAACATC 
TGCCAAGCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATTGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATA 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTAT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAACACATCA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGACGGCGCA 



TCCGATACAG 
GTCATGATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCC 
AAACCGTGGA 
CAGCAGGGGT 
CTGGTCTTTG 
TGATATTTTG 
AG CATC AT CC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATCACTCTGC 
ACAATCCGTT 
GCACTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGTTG 
CCGTCCTGAC 
GATTATGTCG 
TTCGGAGTGT 
GTCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



ACCGGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTGCG 
ATTTCGATTT 
GCAGTAGAGG 
CTGCAAAAAA 
TATTTCTGAT 
GATATTCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCAT 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AGTTATTATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCGCGC 
CCTCGGCGAC 
GCAGCCGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCAGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTCTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACAAAATCGC 
TTTGACTGCC 
ACCAACCCAA 
GCAGGTTCGC 
AACAGCAAAT 
TGCTTGCCTG 
ATGACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCTTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTCT 
CGCAAGGGGG 
CCCTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TAGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
ACATCGTTTT 
TACTTATTAC 
TGCTGGCGGT 
GGAAAACGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
ATT AC ATT AC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATCAT 
GACACCTGCG 
GCCAAAATCC 
GCTGGCAGAC 
CCGAAGCCGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCTCG 



This encodes a protein having amino acid sequence <SEQ ID 832>: 



55 



60 



65 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



MQAVRYRPEI_ 

GIILSEIQNG 

QMRKTVELSA 

PLLLXFCCKK 

LSTLRFPELL 

IDKHNPF IPG 

SLYLYHWIFX 

KRKMTFKKAF 

FPETVLTLGD 

NPLCRKYRDE 

ETVKRIAAVK 

KSNQAVFDLI 



DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 

AAVSLASVIA SQIFLYEDFN 



SFSFRDFYTR 
VFLSNIYLGF 
TKSLRVLRNI_ 
AGSLLAVYGQ 
MTLLLPCLLT 



RIKRIYPAFI 
QQGYFDLSAD 
SIILFLILTA 



ENPVLHIWSL 
TSFLPSGFYT 



TQNGRRQTAN 
ALLI RSMQYG 



GKRQLLSSLC 



AVEEQYYLLY 
DILNQPNTYY 
FGALLACLFV 



AFAHYITGDK 
FCLYLAPSLI 
SHAGHLRGFL 
VEKAEAVFIA 
PVYVFANNTS 
KDIPNVHWVD 



QLGLPAVSAV 



TLPTRILSAS 
AALTAGFSLL 



LVGYNLYARG 
DYVGSREGWK 
QFYDLRMGGQ 
ISRSPLREEK 
AQKYLPKNTV 



ILKQEHLRPL 
AKILSLDSEC 
PVPRFEAQSF 
LKRFAANQYL 
EIYGRYLYGD 



PIVFVGKISY 
SYYLIEQPLR 
PGAPLAAENH 
LVWVDEKLAD 
LIPGFPARFR 
RPIQAMGDIG 
QDHLTYFGSY 
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10 



30 



601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 

nrfl28a oeo MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
' P P It HI Hit til till I MM INI I Mill MUM HIM Mill I III I INI I I II 
orf 128-1 MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

orfl28a pep S FS FRD FYTRRIKRI Y PAFI AAVS LAS V I ASQI FL YE D FNQMRKTVE LS AVFLSN I YLG F 
| M | | t | | II II I II I I II II ! I I II H I H I II II I H 1 II II II II I 1 II II II II H 
orf 128-1 SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

orf 128a pep QQGYFDLSADENPVLRIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
M I I I I I I I II II I 11 I II 11 II II H II 11 I! I II II H H H II II II H H 11 II H 
orf 128-1 QQGY FDLSADENPVLH I WS LAVEEQYYLL YPLLLI FCCKKTKSLRVLRN I S 1 I LFLI LTA 



15 orf 128a pep TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

: 1 | 1 H H I H I II H H I II II I II II H H II 11 II II t H t tl t It i I 1 II II I I II 
orf 128-1 SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

orf 128a . pep FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
20 I I I I I H I It I II H 1 II II II I I II II I 11 I I II H I II I I 11 II II H I I I H I II II 

or f 1 2 8 - 1 FGALLACLFVI DKHNPFI PGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKI S Y 

orf 128a. pep SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
I I H II I II I I I II I II I II I II II II II II I I I II II II M II II I 1 II I I II M H II 
25 orf 128-1 SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

orf 128a. pep FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
It II II I II H II I II I I I I I I I I I II II II II I 1 II 1 I 1 I I I II 1 II I! 1 II I 1 11 1 II 
orf 128-1 FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 



orf 128a . pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
II 11 I II I I H II II II H I II I H H II II H 11 H I H II II 11 I t II II II II It H 
orf 128-1 DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 



35 orf 128a . pep PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

I II 1 II I I I II II I II I II I I H II I I II I I H II II I I II II II I II I II II I I II I II 
orf 128-1 PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

orf 128a. pep RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
40 It I I I II II II I I I I II I II I II H I M II ! I 1 H I II II II I II I I II II II I I II I II 

orf 128-1 RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

orfl28a.pep YMGRE FHKHERLLK S S RDGALQX 
I II II 1 I II I II II II : I I I II 
45 orfl28-l YMGRE FHKHERLLKS S HGGALQX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF 128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from N. 
gonorrhoeae: 

50 orfl28.pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 30 

H I I H II II I II II I II I II II : I It : II 
orfl28ng ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVF 112 

orf 128. pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 
55 I II I M II : I II H II I II H II I II II I I II I II II II I II II I II II I II I I II I I 

orf 128ng LSN I YLGFRLGY FDLSADENPVLH IWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNI SI 172 

orf 128. pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 
II II II I I I I 11 I : I I! I I I H II I I I I II II II I II I : II II II I I II I I I II I I III 
60 orfl28ng ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

orf 128 . pep RQLLSSLCFGALLACLFVI DKHNPFI PGMTLLLPCLLTALLIRSMQYGTLPTRILSAS PI 210 

Mill II I 11 II : It I I I II I : I II H : I I I I II II II 11 II I II I I II I I II II II I I 
orfl28ng RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 2 92 
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orf 128 . pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 24 4 

I I II I I M I M I I I M I I M I I I I I I I I I I 
orfl28ng VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 352 

The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 

1 ATGCAAGCTG TCCGATACAG GCCTGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATTATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCGGGATT CCTCATTACC 

151 AACATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CCCTGGCTTC GGTGATTGCT TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGAGGA AAACCATAGA GCTTTCTACG GTTTTTTTGT CCAATATTTA 

351 TTTGGGGTTC CGATTGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 T ACT G CAT AT CTGGTCTTTG GCGGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCTCTTTTGC TGATATTCTG TTACAAAAAA ACCAAATCAC TACGGGTGCT 

501 GCGTAATATC AG CAT CAT CC TGTTTCTGAT TTTGACCGCA TCATCGTTTT 

551 TGCCGGCCGG GTTTTATACC GACATCCTCA ACCAACCcaa TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GTGGGTTCGC TGTTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGAAAAT GGAAAACGGC 

701 AGTTGCTTTC ATTACTCTGT TTCGGCGCat tgCTTGTCTG CCTGTTCGTG 

7 51 ATCGACAAAC ACGATCCGTT TATCCCGGGA ATAACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCGCTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCCTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGCTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTTT ATCTCGCCCC 

1101 GTCCCTGATG CTTGTCGGTT ACAACCTGTA TTCAAGAGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGCTG CCCGGCACGC CCGTTGCTGC GGAAAATAAT 

1201 TTTCCGGAAA CCGTCTTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCGGCAGGGA AGGGTGGAAA GCTAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TGGATGAGAA GCTGGCAG AC 

1351 AACCCGTTGT GCCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCTGT 

1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTGATACCCG GGTTCAAAGC CCGATTCAGG 

1501 GAAACCGTCA AGAGGATAGC CGCCGTCAAA CCTGTATATG TTTTTGCAAA 

1551 CAATACATCA ATCAGCCGTT CTCCCTTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCTATAAA CCAATACCTC CGGCCTATTC GGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGGTT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATACACG 

1751 GACGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTCA AGCATTCCCG 

1851 AGGCGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 834>: 



1 MQAVRYRPE I DGLRAVAVLS VIIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYPA FI AAVSLASVIA SQIFL YEDFN 

101 QMRKTIELST VFLSNIYLGF RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SIILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKRQ LLSLLC FGALLVCLFV 

251 IDKHDPF IPG ITLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLM LVGYNLYSRG ILKQEHLRPL PGTPVAAENN 

4 01 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKILSLDSEC LVWVDEKLAD 

4 51 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV EIHGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 



orf 128-1. pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
I I I I I I I I II I I I I I I I I M I : I I I I I I I 11 I I 1 I I I II ! I I I M I M I I : I I II I I M I 
orf!28ng MQAVRYRPEIDGLRAVAVLSVIIFHLNNRWLPGGFLGVDIFFVISGFLITNIILSEIQNG 



orf 128-1 .pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
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10 



15 



20 



25 



30 



35 



I | | M I I I I I M I I II I I M I I I 1 I M I I 1 I I I I I 1 I 1 I M I I II : I I 1 : I I I I 1 I I I I I 
orfl28ng SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 

orf 128-1 pep QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
: M I I I I I I I I I ( I i I ( If I I I I I I I M I I I I I I M I I M I I II I I I II I I M I M I I 
orfl28ng RLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISIILFLILTA 

orf 128-1 . pep SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

I I I I I : M I M I I I I I M II M II II II I I : I I M I I I M M I I II I I I M I I I M I I 
orf I28ng SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 

orf 12 8-1 .pep FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
M I I I : I M I I I M : I I M I : I M I I I I II I I I I I I I I M M I I I II I I M I I I I M M I 
orfl28ng FGALLVCLFVIDKHDPFIPGITLLLFCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

orf 128-1 .pep SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

II I I II I II M I! M 1 I II M I II M I I I I I I I 11 M I II M I I I I I I I I I M I I I II I I 
orf 128ng SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMT FKKAF 

orf 128-1 . pep FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I I M I I II I : M II II I : I M I M M I I I II I : f : II I I : I I I M I I M I II II M I I I I 
orfl28ng FCLYLAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 

orf 128-1 . pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
I I I I : I II II M I II M I I M I I I 1 I I I I I I I II I M I I I M M M I I I I II II I M I M 
orf 128ng DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

orf 128-1. pep PVPRFEAQS FL I PGFPARFRETVKRIAAVKPVYVFANNT S I SRS PLREEKLKRFAANQYL 
I I I II I I ! I I II I I I I M I I I I I M I I I I I I I I M I I M I I I I I M I M I I I I I II II 
orf 128ng PVPRFEAQS FLIFGFKARFRETVKRIAAVKPVYVFANNTS I SRS PLREEKLKRFAINQYL 

orf 128-1 . pep RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I I I : M I II M I I I I I I I I : I I M I II I I I I I II I I I I M I I : I I I I I I II I) M I M II 
orfl28ng RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

orf 128-1 . pep YMGREFHKHERLLKSSHGGALQX 
M I I I I I I I I I I I I I : I I I M I 
orfl28ng YMGREFHKHERLLKHSRGGALQX 

610 620 



40 In addition, ORF218ng shows homology to a hypothetical H.influenzae protein: 



sp|P43993| Y392jiAEIN HYPOTHETICAL PROTEIN HI0392 >gi | 1074385 Ipir | IB64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
45 Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%) , Positives = 152/225 (67%), Gaps = 1/225 (0%) 





Query: 


38 


VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 


97 


50 






+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 




Sbjct: 


1 


MDIFFVISGFLITGIIITEIQQNSFSLKQFYTRRIKRIYPAFITVMALVSFIASAIFIYN 


60 




Query: 


98 


DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQXXXXXXXXXIFC 


157 








DFN++RKTIEL+ FLSN YLG GYFDLSA-f ENPVLHIWSLAVE Q I 




55 


Sb j ct : 


61 


DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 


120 


Query: 


158 


YKKTKSLRVLRNISIILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAV 


217 








YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 






Sbjct: 


121 


YKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLSNLRFPELLVGSLLAI 


180 


60 


Query : 


218 


YGQTQNGRRQTENGKRQLLSLLCFGALLVCLFVIDKHDPFIPGIT 262 










Y N + Q +L++L L CLF+++ + FIPGIT 






Sbjct: 


181 


YHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 
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This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins from Kmeningitidis and K. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 

Example 99 

The following partial DNA sequence was identified in Kmeningitidis <SEQ ID 835>: 

1 . . ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 GGGGCTGACG GTCGTGGCAA C . GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 

1 , . IIYSYRWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI HLEKAGAPMR 
51 VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV, . 

Further work revealed the complete nucleotide sequence <SEQ ID 837>: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 



1 MDFRFDiT Y£ YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QS I DKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from Kmeningitidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) from strain A of K 
meningitidis: 

10 20 30 40 50 

orf 129 . pep 1 1 YE YRWMFL YGALTTLGLT WAXAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 

1 I I I I I I 1 I M I II I I I I I I I I I : I 1 I I I 11 I I I I 1 I I I 1 M 1 I 1 M I I I I M I 
orf 12 9a MDFRFDIIYEYRWMFLYGALTTLGLT VVATAGGSVLGLLLALA RLIHLEKAGAFMRVLAW 
10 20 30 40 50 60 



60 70 80 

orf 12 9 . pep ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV 
I I i M I I I I I I I M I I M II I I II II I I II I I I I 
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orf 129a AI.RKVSLLYVTLFRGTPLFVQIVI WAYVWFPFFV HPSDGILVSGEAAIALRRGYGPLIAG 
70 ~80 90 100 110 120 

orf 129a SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
130 140 150 160 170 180 

The complete length ORF 129a nucleotide sequence <SEQ ID 839> is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

7 01 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 840>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 

orf 12 9a . pep M D FR F D 1 1 YE Y RWM F L Y GALT T LG LT V V AT AGG S V LG L LL AL ARL I H LEKAG APMRV LAW 
I | i I I 11 I It I i I t I I M I I I I! It M II I I I I I I I I t I I It I It I I I M I t I I I M I M 
orf 129-1 MDFRFD 1 1 YE YRWMFLYG ALT T LG LTVVAT AGGS VLGLLLALARL I H LEKAG APMRV LAW 

orf 12 9a. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 
I I I I I I I I I I I II I M I II I I I I M I I I I I I I I I I I I 11 I M I I M I I I M I I I I I I II I 
orf 129-1 ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129a. pep S L AL I AN S GAY I C E I FRAG I Q S I D KGQME AARS LG LT Y PQAMR Y V I L PQ ALRRML P P LAS 
I I I I I I I I I I ! I I I I I I I II I M I II I I I I I I I I I I I I I II II I M I M I I I I I I I I I I I 
orf 12 9-1 S LAL IAN S GAY I CE I FRAG IQS I DKGQMEAARS LGLT YPQAMR Y V I L PQALRRML P PLAS 

orf 129a . pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I I I I I I I I ( I I I I I I II I I I II I I I I II I I I I I I I II I I M I I t I I I I ( i I II i f II M i 
orf 12 9-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

orf 12 9a. pep KRYNPQHRX 
I I I I M I I I 
orf 129-1 KRYNPQHRX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
N. gonorrhoeae: 

orf 129 . pep IIYEYRWMFLYGALTTLGLTWAXAGGSVLGLLLALARLIHLEKAGAPMRVLAW 54 

I I I I I I I I II I I II I II I M i I I : I I I I I I I I I I I I I M I I I I I I I I I I I I I II 
orfl29ng MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 60 

orf 12 9 . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 

I I II I I I I I I ( I I I I I I I I I I I I I I I I I I I I I I I 
orfl29ng ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 120 
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An ORF129ng nucleotide sequence <SEQ ID 841> was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTPL FV QIVIWAYVWF PFFVIL HTAF 

101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ID 843>: 

1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 

orf 129-1 . pep MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I II II M I I I M I ! I M I I ! I I I I I ! I I I 1 I I I I M I M I ! I M i I I I I M I I ! M I II I 
orfl2 9ng-l MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orf 12 9-1. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 
M I I I I I I I I 1 I I I I I 1 M I I I I I I I I I I I I I I I I I I I M M I I I I I I I 1 I I ! I I I I I I I 
orf 129ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 12 9-1. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
I I I I II I I II I I I M I I I I II I II II I I I II ! I I I I I I i I M I I I I I I I II ! I I i I I 1 I 
orfl2 9ng-l S LAL I AN SGAY I CE I FRAG I QS I DKGQME AAC S LGLT Y PQAMR Y VI L PQALRRMLPPLAS 

orf 129-1 .pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I M I M II I I II I 1 I I I I I I I I I I I I I I I I I II I M I I I I : M I I I I 1 I II I I M I 1 I I I 
orfl2 9ng-l EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLE 

orf 12 9-1. pep KRYNPQHRX 
I I I I I II I I 
orf 12 9ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus ] Length = 224 
Score = 132 bits (329), Expect = 2e-30 

Identities = 86/178 (48%), Positives = 103/178 (57%), Gaps = 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI + I +F P+ GI + E A G +AL 

Sbjct: 58 I S TAYVE VI RGT PLLVQI L I VYFGLPAIGINLQPEPA GIIAL 99 
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Query 125 IANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 S I C S GAY I AE I VRAG I E S I P I GQME AAR S LGMT Y LQ AMRY V I FPQ AFRN I L P ALGN E F I A 159 

Query: 185 LLKDSSLLSVIAVAEIAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPLSRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N, meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 100 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 845>: 



1 . . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 

51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 

101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 

151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 

201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcTAgT 

251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 

301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 

351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 

401 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 

4 51 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 

501 TGACCGCCGC CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 

551 CGGGCGAATG CGTTTACAGA CGATCCGGAr TAr 



This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 



Further work revealed the complete nucleotide sequence <SEQ ID 847>: 



1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

4 51 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

7 01 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 

901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 

951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 



This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 



51 
101 
151 



1 



. LKECRLKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 
LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 
HLITLGGMMG GVMMVWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 
FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PI FRAN AFT D DPE* 



1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 



51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 
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101 ARLIWLDRNT DNFA LLMLLA AFTVFQTAYA V 5GDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAFLM NVN P IFF1TVPAI LTAAVFVL YL FTFIPIFRAN 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from K meningitidis (strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORF (ORF130a) from strain A of N. 
meningitidis: 

10 20 30 

orfl30 pep LKECRLKDPVFIPNIVYKNIAITFLLLHAA 

I I I I I I I I M I I I I : I I I I I t I I I M M M 
orfl30a LNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNWYKNIAITFLLLHAA 
140 150 160 170 180 190 



40 50 60 70 80 90 

orf 130 . pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 

I I I I I M t M I 11 : I I II M I II II I I I 1 I I I I I 1 I M M I I M I M I I I M I I I I 11 
orf 130a AE L WL PAQT AG FT S L AVG F I L LAKLRE LHHHE LLRKH YVRT Y YLLQL FAAAG Y LWT GAAK 

200 210 220 230 240 250 



100 110 120 130 140 150 

orf 130 . pep LQNLPASAPLHLITLGGmGGVMKVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
I I I I I I II I 1 M I I I M I I I : M I I M II 11 M M I I I I i I I M M I I I II i I I I I I I M 
orf 130a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 



160 170 180 190 

orf 130. pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 
1 MM I I I I I M I I I M I I I I I I I : I I II I I II M I I I I 
orf 130a VLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPEX 
320 330 340 350 

The complete length ORF 130a nucleotide sequence <SEQ ID 849> is: 



1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

4 01 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

4 51 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCAG TATTCATCCC CAATGTCGTC TATAAAAACA 

551 TCGCCATTAC CTTCCTGCTC CTGCACGCCG CCGCCGAACT TTGGCTGCCT 

601 GCGCAAACCG CCGGTTTTAC CTCGCTCGCC GTCGGCTTTA TCCTGCTTGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CCTGCGCAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

.751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGT GGCATGATGG GCAGCGTGAT GATGGTGTGG CTGACTGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAGCTCG ACTACCCGAA ACTCTGCCGC 

901 ATCGCCGTCC CCATCCTNTT CGCCGCCGCC GTTTCGCGCG CTGTTTTAAT 

951 GAACGTAAAC CCGATATTCT TCATCACCGT CCCCGCAATT CTGACCGCCG 

1001 CCGTGTTCGT GCTTTACCTG CTGACATTCG TACCGATCTT TCGGGCGAAC 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 850>: 



1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 



51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQTASFFVA AYWLVLLLFC 
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101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLNMAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLONLPA SAFLH LITLG GMMGSVMMVW LTA GLWHSGF TKLDYPKLCR 

5 301 IAVPILFAAA VSRAVLM NVN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 

orfl30a pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
I | M I I i I I I I I M ( I t II II M I I I I I I I M I I ! ! M M I I I I I II I I I I II M I M I I 
JO orf 130-1 MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130a . pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

( I I i I I I I I I I I I M I I I I I I I I I (I I I I I I I I I I M I I I I M I I I I I I I I II II I I I I I 

orf 130-1 KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

orf 130a . pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNW 
I I M i I I i I II I I I I I I I I II I II I i I I I I I I I I I I I I I I I I I I I I i II M M I I I II : I 
orf 130-1 AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

20 orf 130a - pep YKNIAITFLLLHAAAELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

M I M I I M I I I II I i I I I I I I I I II I *. I I M M I I I I I I i II II I I I I I M M I I I i II 

orf 130-1 YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

orf 130a . pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCR 
25 I M II I M M I I I I II II i I I I I I I I I I II I M I : I I I I I I I I I I I II I M I I I I II I I I 

orf 130-1 LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130a . pep IAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPE 
M I M I I II I II M I I 1 M I II II I 1 I I I I I I M II I I I : M : I I I I II I I I I I i I 
30 orf 130-1 IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPE 

Homology with a predicted ORF from K gonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 
N. gonorrhoeae: 

35 orf 130. pep LKECRLKDPVFI PNIVYKNIAITFLLLHAA 30 

I M I M M M I M I : : I I I I I I I MINI 
orfl30ng LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVIYKNIAIT-LLLHAA 201 

orf 130 . pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 90 
40 I I M I I I II M ! I I ! I M I M I I I I II I I M I I I I I I I I I I I I I I I I M I I I MINI 

orf 130ng AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 2 61 

orf 130 . pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 
I M II II I I M I II M II I I I I I I I I I I I M I I I I II I M I I II I M I M I I : M II I 
45 orf 130ng LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 

orf 130. pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPE 193 

I MM M M M M M M M II I : M M M M II M II II 
orf 130ng VLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPE 364 

50 An ORF130ng nucleotide sequence <SEQ ID 851> was predicted to encode a protein having amino 
acid sequence <SEQ ID 852>: 

1 MNKFFTHPMjR PFFVGA AVLA ILGALVFFHQ PRRYHPAPPN FLGTYAAGCI 

51 RRFFDYRFVG PDGFFRQPET CRYFDG GWA CCGCFIAVFT ATC RIFRRRL 

101 LAGVAAVLRL ADLARRQHRT LRSVDVTAAF TVFQTAYAVS GDLNLLRAQV 

55 151 H LNMAAVMFV SVRVSVLL GT ETLKECRLKD P VFIPNVIYK NIAITLLL HA 

201 AAELWLPAQ T AGFTALAVGF ILLAKL RELH HHELLRKHYV RTYYLLQLFA 

2 51 AAGYLWTGAA KLQNLPASAP LHLITLGGMT GGVMMVWLTA GLWHSGFTKL 

301 DYPKLCR IAV SILFASAVSR AVLM NVNPIF FITVPE ILTA AVFMLYLLTF 

351 VPIFRANAFT DDPE* 
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Further work revealed the following gonococcal DNA sequence <SEQ ID 853>: 

1 ATGCGCCCGT TTTTCGTCGG TGCGGCAGTA CTTGCCATAC TCGGTGCGTT 

51 GGTGTTTTTT ATCAACCCCG GCGCTATCAT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCT GCATACGGCG GTTTTTTGAC TACCGCTTTG 

151 TTGGACCGGA CGGGTTTTTC AGGCAACCTG AAACCTGCCG CTACTTTGAT 

201 GGCGGTGTTG TTGCTTGTTG CGGCTGTTTT ATTGCCGTTT TTACCGCAAC 

251 TTGCCGCATT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCTGGCTGA TTTGGCTCGA CCGCAACACC GACAACTTCG CTCTGTTGAT 

351 GTTACTTGCC GCATTTACCG TTTTTCAGAC GGCCTATGCC GTCAGCGGCG 

4 01 ATTTGAACTT ACTGCGCGCG CAAGTGCATT TGAATATGGC GGCGGTCATG 

4 51 TTCGTATCCG TCCGCGTCAG CGTCCTTTTG GGCACGGAAA CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCCG TATTCATCCC CAACGTTATC TATAAAAACA 

551 TCGCCATCAC CCTGCTGCTG CACGCCGCCG CCGAACTTTG GCTGCCCGCG 

601 CAAACCGCCG GTTTTACTGC GCTTGCCGTC GGCTTCATCC TGCTCGCCAA 

651 GCTGCGCGAA CTGCACCATC ACGAACTCTT ACGCAAACAC TACGTCCGCA 

7 01 CTTATTACCT GCTCCAGCTC TTTGCCGCCG CAGGTTATCT GTGGACAGGC 

751 GCGGCGAAAC TGCAAAACCT GCCCGCCTCC GCGCCCCTGC ACCTGATTAC 

801 CCTCGGCGGC ATGACGGGTG GCGTGATGAT GGTGTGGCTG ACTGCCGGAC 

851 TGTGGCACAG CGGCTTTACC AAACTCGACT ACCCGAAACT CTGCCGCATC 

901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

This corresponds to the amino acid sequence <SEQ ID 854; ORF130ng-l>: 

1 MRPF FVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 

51 LDRTGFSGNL KPA ATLMAVL LLVAAVLLFF L PQ LAAFFVA AYWLVLLLFC 

101 AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVH LNMAAVM 

151 FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL HAAAELWLPA 

201 Q TAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL FAAAGYLWTG 

251 AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT KLDYPKLCRI 

301 AVSILFASAV SRAVLM NVNP IFFITVPE IL TAAVFMLYLL TFVPI FRANA 

351 FTDDPE* 

ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 

orf 130-1 . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
I I I M ! I I I I I M II I I ! ! I I I I! I i : ! I t I! I I I I II I I I M I I I I : I I I I M I I M I 
orfl30ng-l MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

orf 130-1. pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDN FALLMLLA 
M : I I I M : I I I : I : : : I II II I : 1 I I I I I I I I I I I I I I I II II I I I I! I I I I I I II 
orfl30ng-l KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDN FALLMLLA 

orf 130-1 - pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 
I II I II I I ! I I I I M M I I I I I II I I I I I I I I I I M I : I ! I I : II M II I M I I M I : : 
orf 130ng-l AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

orf 130-1 . pep YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRE LHHHELLRKH YVRTYYLLQ 
M It I ( I I II I M I I II I I I M I I I II I II I I II I I I I I I I I I II M I I M II I I I M I 
orfl30ng-l YKNIAIT-LLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

orf 130-1 , pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 
I I I I I II I II I I M I I I I I II II I I I I I II I I I I I I I II I I I M I I I I M I I I II I I I I 
orf 130ng-l LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130-1 .pep IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPEX 
III IIIIMIMI I I I II I I I I M I I I I I I I M : I I I : II : I I I I I I M II I I II 
orfl30ng-l IAVSILFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 



their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 101 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 855>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

5 101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C.TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

10 351 CTGCTTGGAA AAG . . 

This corresponds to the amino acid sequence <SEQ ID 856; ORF131>: 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

1 5 Further work revealed the complete nucleotide sequence <SEQ ID 857>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

20 201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CC'FGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

25 This corresponds to the amino acid sequence <SEQ ID 858; ORF131-l>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 

30 Homology with a predicted ORF from meningitidis (strain A) 

ORF131 shows 95.0% identity over a 121aa overlap with an ORF (ORF131a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 131, pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 
35 I I II M I I I II I I ! I I I I I ! I ! M I M I I I II I I I M I I ! It I I I I I I I! ! I I !! I I I 

orf 131a MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKFAAIDFWDIGGESPPSLED 

10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf 131. pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 

I I 1 I I I I I I I I II I I I I I 11 I M II I M II I I I I M I II I I II M M I I I I Mill: 
orf 13 la YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

70 80 90 100 110 120 

45 

orf 131. pep K 
I 

O r f 1 3 1 a KQGLRRNGLSERVRWX 
130 

50 The complete length ORF 131a nucleotide sequence <SEQ ID 859> is: 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 
51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 
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101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

5 301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
10 51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 

101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a and ORF131-1 show 97.0% identity in 135 aa overlap: 

orf 131a . pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
t I M I I I t I I II I I I II I I I M I M I ! I It I M : I M I M I I I I II I I M I I M I I M I 
15 orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 131a . pep YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 
I I I M I I II I I I II M II I I I M I II I I II I I I M II I I I I I M I M I M I I I I I I M : 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



20 



35 



orf 131a. pep KQGLRRNGLSERVRWX 
I I I I I I I I I I I I I I It 
orf 131-1 KQGLRRNGLSERVRWX 



25 Homology with a predicted ORF from N. gonorrhoeae 

ORF131 shows 89.3% identity over 121 aa overlap with a predicted ORF (ORF131ng) from 
N. gonorrhoeae: 

orf 131. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 
I M I : I I I I I M I : M II II M I I I I M I ! I : I I I II I I I I I M I I M I I I M 1 I I 
30 orf 131ng MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 



orf 131 . pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 120 

I I II I M I I I I II I M I I I I : I I I I I II I I I I I I I I I II I I I I I I : I III I I II I I 
orfl31ng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 



orfl31.pep K 121 
I 

orfl31ng KQGLRRNGLSERVRW 134 

A complete length ORF131ng nucleotide sequence <SEQ ID 86 1> was predicted to encode a 
40 protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 

45 1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtCcgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

50 251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; ORF131ng-l>: 
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1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and ORF131-1 show 92.6% identity in 135 aa overlap: 

orf I31ng-1 . pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
i I | | : I t I It I II : ( I I I I I I M M I II M M : I I M M M I I I II M I I M I I II I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 131ng-l . pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 
I M I I I I I II I II If I I I II I : I I I M I I I I I II I I I I I I M I I I I I : I 111 MINI 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



orfl31ng-l .pep KQGLRRNGLSERVRWX 
I I M I I I II I I II I I I 
orf 131-1 KQGLRRNGLSERVRWX 



Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 102 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 865> 



1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

2 01 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

401 CGGGCTTCCT TATtGGCGGC GTACC.GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

7 51 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 



1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

2 01 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ED 867>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAACACA 
TGCCGCCATT 
AGATGTATCC 
TATGAAGGCT 
CGTTATCGGC 
TGAACCTCGG 
GTGCTGCACC 
GACCACCGCC 
CGGGCTTCCT 
CTGCCGCAAA 
CATCGAAGCC 



TCCATATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
CCTGCCTTAT 
ATCATTGGGT 
TCCATGCTCG 
TATTGGCGGC 
CGCCGCGCCA 
GACGAATACG 



CGGTATCGGC 
CGGGGTTTGA 
ACCCAGCTCG 
TCAGTTGGAC 
AGCGCGGGAT 
ATTTCCGGCC 
ACTCGGTGTG 
CATGGGTCTT 
GTACCGGAAA 
AGACCCGAAC 
ACACCGCCTT 



GGCACGTTTA 
AGTCAGCGGT 
AAGCCTTGGG 
GAATTTAAAG 
GGATGTGGTT 
CGCAATGGCT 
GCGGGGACGC 
GGAATATGCC 
ATTTCGGCGT 
AGCCAATCGC 
TTTGGACAAA 



TGGGCGGGCT 
TGCGACGCGA 
TATAGACGTG 
CCGACGTTTA 
GAAGCGATTT 
GTCGGAAAAC 
ACGGCAAAAC 
GGCCTCGCGC 
TTCCGCCCGC 
CGTTTTTCGT 
CGTTCTAAAT 
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551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG AT AC AG AC C C AGTTCCACTA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

7 51 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 

801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 

851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 

901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 

1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 

1 MKHIHIIGIG GTFMGGLAAI A KEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDVV EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 

4 01 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 

451 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical o457 protein of E.coli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 

Orf 132 : 4 IHIIGIGGT FMGGLAA I AKE AG FE VSGCDAKM Y PPMS TQLE ALG I DV YEG FDAAQL DE FK 63 

IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
o4 57 : 3 IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 

Orf 132 : 64 ADVYVIGNVAKRGMDVVEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
o457 : 62 PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

Orf 132: 124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
o457: 122 TWILEQCGYKPGFVIGGVPG 141 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 132. pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 
I I I I I I M I I I I i II I : I I I I I I I M ! I I i I I I I I I I I I I I I II i I 1 I I I I I I : I I II 
orf 132a MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 132 . pep EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 
I I I I I I II II I I 1 II I I I I I I I ) I I 11111111111:11 I I I I I MM M M M M 
orf 132a E FKADV Y V IGN VAKRGMD WE AI LNRG L P Y I S G PQWLAENXLHHHWX LGVAXTHGKTTTA 

70 80 90 100 110 120 



130 



140 



150 160 
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SMLAWVLEYAGLAPGFLIGGVXGKFR RFRPPAANAAPRPEQPI AVFR 

I M I I I I ( I I I M I I I i I M : I I : I : I : : I : • 1 

SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 

130 140 150 160 170 

170 180 190 200 210 220 

HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 

: I I : : : : I 

KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 

The complete length ORF132a nucleotide sequence <SEQ ID 869> is: 

1 AT G AAAC AC A TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGTGGGAT 

51 TGCCGCCATT GCCAAAGAAG CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 

151 TATGAAGGCT TCGACACCGC GCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 

301 NTGCTGCACC ATCATTGGNN ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 

351 GACCACCGCG TCTATGCTCG CGTGGGTTTT GGAATATGCC GGACTCGCAC 

401 CGGGCTTCNT TATCGGCGGC GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 

4 51 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATTGAAGCC GACGAATACG ACACCGCGTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA TTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCCT CATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGACACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 

801 CTCGTTCGAC GTGTTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCTTGGA 

851 GTTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCNGT CATCGCCGCC 

901 GCGCGTCATG CCGGAGTNGA CATTCAGACG GCCTGCGAAG CCTTGAGCAC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGTA 

1001 TCACCGTTTA CGACGACTTC GCCCACCATC CGACCGCTAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAGCG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAATACGA TGAAGCTGGG TACGATGAAA GCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGNTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGCACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 CAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This encodes a protein having amino acid sequence <SEQ ID 870>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFEXSG CDAKMYPPMS TQLEALGIGV 

51 YEGFDTAQLD EFKADVYVIG NVAKRGMDW EAILNRGLPY ISGPQWLAEN 

101 XLHHHWXLGV AXTHGKTTTA SMLAWVLEYA GLAPGFXIGG VPENFSVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HAD I FAD LG A IQTQFHHLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKKAGH VAWSLMGGHN RMNALAVIAA 

301 ARHAGVDIQT ACEALSTFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK AALPASLKEA DQVFXYAGGA 

4 01 DWDVAEALAP LGGRLHVGKD FDAFVAEIVK NAEAGDHILV MSNGGFGGIH 

4 51 TKLLDALR* 

ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 

orf 132a . pep MKH I H 1 1 G I GGT FMGG I AAI AKE AGFEXSGC DAKM Y P PMS TQLE ALGI GV YEG FDT AQL D 
I M I I I II II M M M : I I I II I II I I I II I I I I I I I I I II I I M I 1 MIIM'.MM 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132a . pep EFKADVYVIGNVAKRGMDVVEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 
I I I I I I I I I I M M I M I I M II M I I I I ! II II I I : II Mill I II I II M II I I 
orf 132-1 EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132a. pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
I I I I M M I I I I I I I I I I I I I I I I : I I M I I II I I I I I M I I ! M I I II I I II I M M I 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 



orf 132 .pep 
orfl32a 

orf 132 .pep 
orfl32a 
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orf 132a . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 
1 i I I i | I II I I I It i I I I I I t I I I M I If I I t i I t t : I It I t I I I t I I I I I i I I I I I I I I 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf 132a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 
I I I I M I I I I t I t I I t I I I I t t t I I I I I M I It I I I 11:1 1:111 I I I 1 I I I I I I I I 
orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

orf 132a . pep ARHAGVDIQTACEALSTFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 
It I : I 1 I I 1 I I II I t : : I I t I II I I I I I I I I M t t I II t M I I t t t I I II I i I M II 1 I I 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orf 132a . pep ARI LAVLE PRSN TMKLGTMKAAL PAS LKEADQVFXY AGGADW D VAEALAPLGGRLHVGKD 
I II t I I I I II t I I I I II II t : II t : I I I I I I I f 1 I M I : I I I 1 I I I II I II ! I I : I I 1 I 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orf 132a. pep FDAFVAEIVKNAEAGDHILVMSNGGFGGIHTKLLDALRX 
I I I M I I I I I I I I : i I t I I I I I I I I I I M I 111:1111 
orfl32-l FDAFVAE IVKNAEVGDHI LVMSNGGFGG I HGKLLE ALRX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF 132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 



25 



30 



35 



40 



orfl32 .pep 
orfl32ng 
orf 132 .pep 
orfl32ng 
orf 132 .pep 
orf 132ng 
orf 132 . pep 
orf 132ng 
orf 132 .pep 
orfl32ng 



MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 
M 1 I 1 I I I 1 I M I I I I : I I I I I I t I I : I II t I I I I I M I 1 1 I M I I M I : I I ! I I II I : 
MKH I H I IG IGGT FMGG I AA I AKE AGFKVSGCDAKMYP PMS TQLE ALG I GVHEG FDAAQLE 60 

E FKAD VYV I GN VAKRGMD WE AI LNLGL PY I SGPQWL S E NVLHHHWVLGVAG THGKTTTA 120 
I I : I I : f t I I I I I : I II I M II I I I I I i I I 1 I II I I : M I II M I I II 1 I II I I I II M 
E FQAD I Y V I GN VARRGM D WE A I LNRGL P YI SGPQWLAENVLHHHWVLGVAG THGKTTTA 120 

SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 180 
I t M i t I II I I M i I I i I I I I I II 1 I I M f : I I II till i M I I I I M I I I I M I I 1 
SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 180 

TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 240 

I : fill I I I I I I I t I I t I I I I t I I I I I I I I I I I II : I : : I : M I I I M I 1 I I I 
TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 240 

FGQRLLDAGGKIRHGTRLA 259 

II I I I I I I I (I I I I (III 
FGQRLLDAGGKIRHRTRLADW 261 



An ORF132ng nucleotide sequence <SEQ ID 871> was predicted to encode a protein having amino 
acid sequence <SEQ ID 872>: 



45 



50 



l 

51 
101 
151 
201 
251 



MKHIHIIGIG GTFMGGIAAI 



HEG FDAAQLE 
VLHHHWVLGV 
PTANAASRPE 
PRRHLRRLGR 
KIRHRTRLAD 



EFQADIYVIG 
AGTHGKTTTA 
QQIAVFRHRS 
DTDPVPPPRA 
W* 



AKEAGFKVSG 
NVARRGMDW 
SMLAWVLEYA 
RRIRHRLFRQ 
HRTIRRPHRL 



CDAKMYPPMS 
EAILNRGLPY 
GLAPGFLIGG 
TLQIRALSPA 
QRTAAKPARY 



TQLEALGIGV 
ISGPQWLAEN 
VPGKFRRFRP 
YRRVEQSGIR 
FGQRLLDAGG 



Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 



AT GAAAC ACA 
TGCCGCCATT 
AGATGTATCC 
CACGAAGGCT 
CGTCATCGGC 
TGAACCGTGG 
GTGCtgcacc 
gaccaCcGcg 
CGGGCTTCCT 



TCCACATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
GCTGCCTTAT 
atcaTTGGgt 
tCCATGCTCG 
CATCGGCGGt 



CGGTATCGGC 
CCGGGTTCAA 
ACCCAGCTCG 
GCAGTTGGAA 
GGCGCGGGAT 
ATTTCCGGCC 
ACTCGGCGTG 
CCTGGGTCTT 
gtaccggaAA 



GGCACGTTTA 
AGTCAGCGGT 
AAGCCTTGGG 
GAATTTCAAG 
GGATGTGGTC 
CGCAATGGCT 
GcagggaCGC 
GGAATATGCC 
ATTTCGGCGT 



TGGGCGGGAT 
TGCGACGCGA 
CATAGGCGTA 
CCGATATTTA 
GAGGCGATTT 
GGCTGAAAac 
ACGGcaaAac 
GGACTCGCGC 
TTCCGCCCGC 
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4 51 CTACCGCAAA CGCCGCGTCA AGACCCGAAC AGCAAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA TCGCCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGC ACCGTACCAT CCGAAGGCCT CATCGTCTGC AACGGACAGC 

7 01 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

7 51 AAATTCGGCA CCGGACACGG CTGGCAGATT GGTGAAGTCA ATGCCGACGG 

801 CTCGTTCGAC GTATTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCATGGG 

851 ATTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCCGT CATCGCTGCC 

901 GCACGCCATG CCGGAGTCGA TGTTCAGACG GCCTGCGAAG CCTTGGGTGC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGATTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG TGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAGCCGCGT TCCAACACCA TGAAACTCGG CACGATGAAG TCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCTGCA GGCTGCGCGT 

1251 CGGTAAAGAT TTCGATACCT TCGTTGCCGA AATTGTGAAA AACGCCCGAA 

1301 CCGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 

401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGGFGGIH 

451 TKLLDALR* 

ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap: 

orf 132ng-l .pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 
I I I I I I I I I! I M M I : I 11 I ! I ! I I : ) I I I t M I M I I I ! M I M I I I : I M I I I I I : 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132ng-l.pep EFQADIYVIGNVARRGMDVVEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 
11:11:1111111:11111)11111 ! I I 1 1 I I 1 II 1 : 1 I 11 I II I 1 I II 1 I 11 I I I I 1 I 
orf 132-1 EFKADVYVIGNVAKRGMDVVEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orfl32ng-l.pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 
1 1 I I I I I I I 1 II I I I I I I I I I I I II I I I I I I I I I I 1 I 1 I I I : I I I I I I II I I ! I 11 I I I I 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orf 132ng-l . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDT 
I 1 I I I I I 11 I I II I I I II I I I I I II I I I I I I I I I I 1 : I M I I 1 1 I I I I 1 I 1 I : I II I I I I 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf 132ng-l . pep LDKGCWTPVEKFGTGHGWQIGEVNADGSFDVLLDGKKAGHVAWDLMGGHNRMNALAVIAA 

I I I I I 1 I I I I I I I 1 I II 1 Ihlllllllllllll 11:1 Mill I I I I II 11 I I I I 
orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

orf 132ng-l . pep ARHAGV D V QT ACE ALGA FKNVKRRME I KGT ANG I T V Y D D FAHH PTAIETTIQG LRQR VG G 

II I : I M : I I I I I I I I I M I I I I I I I I 1 I I I I I I I I I I 1 I I I I I I I 1 I I I I I I M I I I M 
orf 132-1 ARHVGVD I QTACEALGAFKN VKRRME IKGTANG I T VYDDFAHH PTAI ETT I QGLRQRVGG 

orf 132ng-l . pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
I II I I I I I I 1 I I I I I I I M I I I I I : I I I I I I I I I I I I I I : I I I I II I I I I I I II I I I I 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orf 132ng-l .pep FDTFVAEIVKNARTGDHILVMSNGGFGGIHTKLLDALRX 
I I : I I I I I 1 1 II :: I I I II I I I I I I 11 I 1 I I I I : I I I I 
or f 1 3 2 - 1 FDAFVAE I VKNAEVGDHI LVMSNGGFGGIHGKLLEALRX 



In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 
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pir||S56459 hypothetical protein o457 - Escherichia coli >gi|537075 (U14003) 
ORF_o457 [Escherichia coli] >gi (1790680 (AE000494) hypothetical 48.5 kD protein 
in fbp-pmba intergenic region [Escherichia coli] Length = 457 
Score = 474 bits (1207), Expect = e-133 

Identities ~ 249/439 (56%), Positives = 294/439 (66%), Gaps = 13/439 <2%) 

Query: 22 KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADIYVIGNVARRGMDWE 81 

++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ + IGN RG VE 
SbjCt: 21 RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 79 

Query : 82 AILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTASMLAWVLEYAGLAPGFLIGGV 141 

A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 
Sbjct : 80 AVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 139 



15 Query: 142 PENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDKRSKFVHYRPRTAVLNNLEFDH 201 

P NF VSA L +S FFVIEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 

Sbjct: 140 PGNFEVSAHL GE S D F FV I EADE YDC AFFDKRS K FVH YC PRT L I LNNLE FDH 190 

Query: 202 ADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDTLDKGCWTPVEKFGTGHGWQIG 261 
20 ADIF DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ EG WQ 

Sbjct: 191 ADIFDDLKAIQKQFHHLVRIVPGQGRIIWPENDINLKQTMAMGCWSEQELVGEQGHWQAK 250 

Query: 262 EVNADGS - FDVLLDGKKAGHVAWDLMGGHNR1>4NALAVIAAARHAGVDVQTACEALGAFKN 320 
++ D S ++VLLDG+K G V W L+G HN N L IAAARH GV A ALG+F N 
25 Sbjct: 251 KLTTDASEWEVLLDGEKVGEVKWSLVGEHNMHNGLMAIAAARHVGVAPADAANALGSFIN 310 

Query: 321 VKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG-ARILAVLEPRSNTMKLGTM 37 9 

+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI +AVLE PRSNTMK+G 
SbjCt: 311 ARRRLELRGEANGVTVYDDFAHHPTAILATLAALRGKVGGTARIIAVLEPRSNTMKMGIC 37 0 

30 

Query: 380 KSALPASLKEADQVF-CYAGGADWDVAEALAPLGCRLRVGKDFDTFVAEIVKNARTGDHI 438 

K L SL AD+VF W VAE D DT +VK A+ GDHI 

Sbjct: 371 KDDLAPSLGRADEVFLLQPAHIPWQVAEVAEACVQPAHWSGDVDTLADMWKTAQPGDHI 430 

35 Query: 439 LVMSNGGFGGIHTKLLDAL 457 

LVMSNGGFGGIH KLLD L 
Sbjct: 431 LVMSNGGFGGIHQKLLDGL 44 9 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



40 ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 

45 experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 



Example 103 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 875> 



50 



1 

51 
101 
151 
201 
251 
301 



. CCGGGCTATT 
CTCGCCGACA 
CCGTATTGAA 
ATTAGTGCGG 
CACACACCGT 
ACTCCGGCGT 
TTTGGCTTCr 



ACGGCTCGGA 
TmCAAGAAAC 
AAAATACGGC 
ACTTCGGCGA 
ATGCCCAACA 
TCACACCGCC 
ATACCTATAA 



TGACGAATTT 
ATTGCAACCG 
AAAAAGCGCG 
TTATTTCATG 
TCCAAGAAAT 
TTAAAACCAG 
AAAAGGATTG 



AAGCGGGCAT 
GAGCTGCGGG 
CCAACAACCA 
CCGTTCGCCA 
GTATTTTTCC 
AGCGCGCAAA 
TTAAAACAAG 



TCGGAGAAAA 
ATTTATGAAC 
TTCGGTCAGC 
GCTATTCGCG 
CAAATCGGCG 
CACTTGGCAA 
ATGATACATT 
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351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

401 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

4 51 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

7 01 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CAT C AAAC AA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC . GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 



1 . . PGYYGSDDEF KRAFGENSPT XKKHCNRSCG I YE PVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ID 877>: 



1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 

301 TCATCTCAAT TCGGTGCATC TGTCGACAGC AATTTTATTG CCGGACTGGA 

351 TGTCGTCAAA GGCAGCTTCA GCGGCTCGGC AGGCATCAAC AGCCTTGCCG 

4 01 GTTCGGCGAA TCTGCGGACT TTAGGCGTGG ATGACGTCGT TCAGGGCAAT 

4 51 AATACCTACG GCCTGCTGCT AAAAGGTCTG ACCGGCACCA ATTCAACCAA 

501 AGGTAATGCG ATGGCGGCGA TAGGTGCGCG CAAATGGCTG GAAAGCGGAG 

551 CATCTGTCGG TGTGCTTTAC GGGCACAGCA GGCGCAGCGT GGCGCAAAAT 

601 TACCGCGTGG GCGGCGGCGG GCAGCACATC GGAAATTTTG GCGCGGAATA 

651 TTTGGAACGG CGCAAGCAGC GATATTTTGT ACAAGAGGGT GCTTTGAAAT 

701 TCAATTCCGA CAGCGGAAAA TGGGAGCGGG ATTTACAAAG GCAACAGTGG 

751 AAATACAAGC CGTATAAAAA TTACAACAAC CAAGAACTAC AaAAATACAT 

801 CGAAGAGCAT GACAAAAGCT GGCGGGAAAA CCTg.CaCCG CAATACGACA 

851 TTACCCCCAT CGATCCGTCC AGCCTGAAGC AGCAGTCGGC AGGCAATCTG 

901 TTTAAATTGG AATACGACGG CGTATTCAAT AAATACACGG CGCAATTTCG 

951 C GAT TT AAAC ACCAAAATCG GCAGCCGCAA AATCATCAAC CGCAATTATC 

1001 AGTTCAATTA CGGTTTGTCT TTGAACCCGT ATACCAACCT CAATCTGACC 

1051 GCAGCCTACA ATTCGGGCAG G C AG AAAT AT CCGAAAGGGT CGAAGTTTAC 

1101 AGGCTGGGGG CTTTTAAAGG ATTTTGAAAC CTACAACAAC GCGAAAATCC 

1151 TCGACCTCAA CAACACCGCC ACCTTCCGGC TGCCCCGCGA AACCGAGTTG 

1201 CAAACCACTT TGGGCTTCAA TTATTTCCAC AACGAATACG GCAAAAACCG 

1251 CTTTCCTGAA GAATTGGGGC TGTTTTTCGA CGGTCCTGAT CAGGACAACG 

1301 GGCTTTATTC CTATTTGGGG CGGTTTAAGG GCGATAAAGG GCTGCTGCCC 

1351 CAAAAATCAA CCATTGTCCA ACCGGCCGGC AGCCAATATT TCAACACGTT 

14 01 CTACTTCGAT GCCGCGCTCA AAAAAGACAT TTACCGCTTA AACTACAGCA 

14 51 CCAATACCGT CGGCTACCGT TTCGGCGGCG AATATACGGG CTATTACGGC 

1501 TCGGATGACG AATTTAAGCG GGCATTCGGA GAAAACTCGC CGACATACAA 

1551 GAAACATTGC AACCGGAGCT GCGGGATTTA TGAACCCGTA TTGAAAAAAT 

1601 ACGGCAAAAA GCGCGCCAAC AACCATTCGG TCAGCATTAG TGCGGACTTC 

1651 GGCGATTATT TCATGCCGTT CGCCAGCTAT TCGCGCACAC ACCGTATGCC 

1701 CAACATCCAA GAAATGTATT TTTCCCAAAT CGGCGACTCC GGCGTTCACA 

17 51 CCGCCTTAAA ACCAGAGCGC GCAAACACTT GGCAATTTGG CTTCAATACC 

1801 TATAAAAAAG GATTGTTAAA ACAAGATGAT ACATTAGGAT TAAAACTGGT 

1851 CGGCTACCGC AGCCGCATCG ACAACTACAT CCACAACGTT TACGGGAAAT 

1901 GGTGGGATTT GAACGGGGAT ATTCCGAGCT GGGTCAGCAG CACCGGGCTT 
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1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



GCCTACACCA 
TTTTGAGTTG 
CTTACGCCTA 
GAATCGCCCA 
GTTGAGCAGG 
GTACGCGCTG 
TTCGGCAAGA 
CAACGGGGGA 
AACAAACCGA 
GCTTACGAGC 
GTTCGACAGG 
CGCAGCGTTA 
ACGTGTAATG 
CAAAAGCGTA 
TGAGCTACAA 



TCCAACATCG 
GAGCTGAATT 
TCAAAAAAGC 
ACAATGCGTC 
GTTTCCGCCC 
GTTGGGCAAC 
GCATCCGCGC 
AATACCAGCA 
AACTCTTGCC 
CGAAGAAAAA 
CGTTATATCG 
TTACAGCTCG 
CTGATAAAAC 
TTGACCAATT 
GTTTTAA 



CAATTTCAAA 
AC GAT TAT GG 
ACGCAACCGA 
CAAAGAAGAC 
TGCCGCGAGA 
AAACTGACTT 
GACGGC TGAA 
ATTTCCGGCA 
CGCCAGCCTT 
CCTTATTTTC 
ATCCGCTCGA 
TTCGACCCGA 
GTTGTGCAAC 
TTGCACGCGG 



GACAAAGTGC 
GCGTTTTTTC 
CCAACTTCAG 
CAACTCAAAC 
TTACGGACGT 
TGGGCGGCGC 
GAACGCTATA 
ACTGGGCAAG 
TGATTTTTGA 
CGCGCCGAAG 
TGCGGGCAAT 
AAGACAAGGA 
GGCAAATACG 
ACGCACCTTT 



ACAAACACGG 
ACCAACCTTT 
CGATGCGAGC 
AAGGTTATGG 
TTGGAAGTCG 
GATGCGCTAT 
TCGACGGCAC 
CGTTCCATCA 
TTTTTACGCC 
TCAAAAATCT 
GATGCGGCAA 
CGAAGACGTA 
GCGGCACAAG 
TTGATGACGA 



This corresponds to the amino acid sequence <SEQ ID 878; ORF133-l>: 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



EAQIQVLEDV 
PGAFTQQDKS 
SSQFGASVDS 
NTYGLLLKGL 
YRVGGGGQHI 
KYKPYKNYNN 
FKLEYDGVFN 
AAYNSGRQKY 
QTTLGFNYFH 
QKSTIVQPAG 
SDDEFKRAFG 
GDYFMPFASY 
YKKGLLKQDD 
AYTIQHRNFK 
ESPNNASKED 
FGKSIRATAE 
AYEPKKNLIF 
TCNADKTLCN 



HVKAKRVPKD 
SGIVSLNIRG 
NFIAGLDWK 
TGTNSTKGNA 
GNFGAEYLER 
QELQKYIEEH 
KYTAQFRDLN 
PKGSKFTGWG 
NEYGKNRFPE 
SQYFNTFYFD 
ENSPTYKKHC 
SRTHRMPNIQ 
TLGLKLVGYR 
DKVHKHGFEL 
QLKQGYGLSR 
ERYIDGTNGG 
RAEVKNLFDR 
GKYGGTSKSV 



KKVFTDARAV 
DSGFGRVNTM 
GSFSGSAGIN 
MAAIGARKWL 
RKQRYFVQEG 
DKSWRENLXP 
TKIGSRKIIN 
LLKDFETYNN 
ELGLFFDGPD 
AALKKDIYRL 
NRSCGIYEPV 
EMYFSQIGDS 
SRIDNYIHNV 
ELNYDYGRFF 
VSALPRDYGR 
NTSNFRQLGK 
RYIDPLDAGN 
LTNFARGRTF 



STRQDIFKSS 
VDGITQTFYS 
SLAGSANLRT 
ESGASVGVLY 
ALKFNSDSGK 
QYDITPIDPS 
RNYQFNYGLS 
AKILDLNNTA 
QDNGLYSYLG 
NYSTNTVGYR 
LKKYGKKRAN 
GVHTALKPER 
YGKWWDLNGD 
TNLSYAYQKS 
LEVGTRWLGN 
RSIKQTETLA 
DAATQRYYSS 
LMTMSYKF* 



ENLDNIVRSI 
TSTDAGRAGG 
LGVDDVVQGN 
GHSRRSVAQN 
WERDLQRQQW 
SLKQQSAGNL 
LNPYTNLNLT 
TFRLPRETEL 
RFKGDKGLLP 
FGGEYT GYYG 
NHSVSISADF 
ANTWQFGFNT 
IPSWVSSTGL 
TQPTNFS DAS 
KLTLGGAMRY 
RQPLIFDFYA 
FDPKDKDEDV 



35 Computer analysis of this amino acid sequence gave the following results: 

Homology with with the probable TonB-dependent receptor HI 121 of Kinfluenzae (accession number U3280O 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

IYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 
I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPNIQEM+FSQ+ ++GV+TA 
INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWV 150 
LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 





Orfl33: 


31 


40 


HI121: 


563 




Orfl33: 


91 


45 


HI121: 


623 




Orfl33: 


151 




HI121: 


681 


50 


Orfl33: 


211 




HI121: 


741 


55 


Orfl33: 


271 




HI121: 


801 




Orfl33: 


331 


60 


HI121: 


860 



S G YTI H+ + 



YD GRFF N+SYAYQ++ QPTN++DAS PNN 



AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 



G+ 



R-f 



++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 



LDAGNDAA +RYYSS 



+ C D + C 



GG+ K+VL NFARGRT++++++ 
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Orfl33: 391 YKF 393 
YKF 

HI121: 911 YKF 913 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A of N. 
meningitidis: 

10 20 30 

orf 133 .pep PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 

III I I I I M I I I I 1 I I I I 1111:1111 
orf 133a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 470 480 490 500 

40 50 60 70 80 90 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
I M I I I M I I I I I I I I I I I M II I I I I II I I I I I I I t II M I I I II I I M I I II I I M II 
YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
510 520 530 540 550 560 

100 110 120 130 140 150 

KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 
I I I II I M I II I I I I II I II I i I I M II I I I I I I I I I I M M II I II I I! : M I II I 
KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVS 
570 580 590 600 610 620 

160 170 180 190 200 210 

S T G L A YT I QHRX FX D KVHQXXXXXXXX Y D YGR F FTN L S Y AY QK STQPTNFSDASES PNN A 
M I II I I II I I I Ml): Ml | | || | I | | I | I | M | M I I I I M I M I I I 

STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNA 
630 640 650 660 670 680 

220 230 240 250 260 270 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 
I I I M I I I I I I M M I I I II II I II I II I I I I I I I I I I I II I I II II I II I I I 1 I II M 
SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 

280 290 300 310 320 330 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 

Ml M M M II II It II I II M M M I I II I M I I M II M II I I M M I I II 

TNGXXTSNFRQLGKRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPL 
750 760 770 780 790 800 

340 350 360 370 380 390 

orf 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 
I II I I II : : M I I M M I M I Mill I M I M I I M II M I I M M I M I II M I M 
orf 133a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSY 
810 820 830 840 850 860 



orf 133 .pep 
orfl33a 



orf 133 . pep 
orf 133a 



orf 133 .pep 
orf 133a 



orf 133 .pep 
orf 133a 



orf 133 .pep 
orfl33a 



orf 133. pep KFX 
I I I 

orfl33a KFX 
870 

A partial ORF133a nucleotide sequence <SEQ ID 879> is: 

1 AAAGACAAAA AAGTGTTTAC CGATGCGCGT GCCGTATCGA CCCGTCAGGA 
51 TATATTCAAA TCCANCGAAA ACCTCGACAA CATCGTACGC ANCATCCCCG 

101 GTGCGTTTAC ACANCAANAT AAAAGCTCGG GCNTTGTGTC TTTGAATATT 

151 CGCNGCGACA GCGGGTTCGG GCGGGTCAAT ACNATGGTNG ACGGCATCAC 

201 NCANACCTTT TATTCGACTT CTACCGATGC GGGCAGGGCA GGCGGTTCAT 

251 CTCAATTCGG TGCATCTGTC GACAGCAATT TTATNGCCGG ACTGGATGTC 

301 GTCAAAGGCA GCTTCAGCGG CTCGGCAGGC ATCAACAGCC TTGCCGGTTC 

351 GGCGAATCTG CGGACTTTAN GCGTGGATGA TGTCGTTCAG GGCAATANTA 

4 01 CNTACGGCCT GCTGCTAAAA GGTCTGACCG GCACCAATTC AACCAAAGGT 
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10 



15 



20 



25 



30 



35 



40 



451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 



AATGCGATGG 
TGTCGGTGTG 
GCGTGGGCGG 
GAACGACGCA 
TTCCAACAGC 
CCAAGTGGTA 
GAAGGTCATG 
CACCCCCATC 
TTAAATTGGA 
GATTTAAACA 
ATTCAATTAC 
CAGCCTACAA 
GGCTGGGGGC 
CGACCTCANC 
AAACCACTTT 
TTTCCTGAAG 
GCTTTATTCC 
AAAAATCAAC 
TACTTCGATG 
CAATACCGTC 
CGGATGACGA 
AAACATTGCA 
CGGCAAAAAG 
GCGATTATTT 
AACATCCAAG 
CGCCTTAAAA 
ATAAAAAAGG 
GGCTACCGCA 
GTGGGATTTG 
CCTACACCAT 
TTTGAGTTGG 
TTACGCCTAT 
AATCGCCCAA 
TTGAGCAGGG 
TACGCGCTGG 
TCGGCAAGAG 
AATGGGGNAN 
ACAAACCGAA 
CTTACGAGCC 
TTCGACAGGC 
GCAGCGTTAT 
CGTGTAATGA 
AAAAGCGTAT 
GAGCTACAAG 



CGGCGATAGG 
CTTTACGGGC 
CGGCGGGCAG 
AGCAACGATA 
GGAAAATGGG 
TCAAAAATAC 
ATAAAAGCTG 
GATCCGTCCA 
ATACGACGGC 
CCAAAATCGG 
GGTTTGTCTT 
TTCGGGCAGG 
TTTTNAAAGA 
AACACCTCCA 
GGGCTTCAAT 
AATTGGGGCT 
TATTTGGGGC 
CATTGTCCAA 
CCGCGCTCAA 
GGCTACCGTT 
ATTTAAGCGG 
ACCAGAGCTG 
CGCGCCAACA 
CATGCCGTTC 
AAATGTATTT 
CCAGAGCGCG 
ATTGTTAAAA 
GCCGCATCGA 
AACGGGAATA 
CCAACACCGC 
AGCTGAATTA 
CAAAAAAGCA 
CAATGCGTCC 
TTTCCGCCCT 
TTGGGCAACA 
CATCCGCGCG 
NTACCAGCAA 
ACCCTTGCCC 
GAAGAAAAAN 
GTTATATCGA 
TACAGTTCGT 
TGATAACACG 
TGACCAATTT 
TTTTAA 



TGCGCGCAAA 
ACAGCAGGCG 
CACATCGGAA 
TTTTGAGCAA 
AGCGGGATTT 
GATGCCCCCC 
GCGGGAAAAC 
GCCTGAAGCN 
GTATTCAATA 
CAGCCGCAAA 
TGAACCCGTA 
CAGAAATATC 
TTTTGAAACC 
CCTTCCGGCT 
TATTTCCACA 
GTTTTTCGAC 
GGTTTAAGGG 
CCGGCCGGCA 
AAAAGACATT 
TCGGCGGCNA 
GCATTCGGAG 
CGGAATTTAT 
ACCATTCGGT 
GCCAGCTATT 
TTCCCAAATC 
CAAACACTTG 
CAAGATGATA 
CNACTACATC 
TTCCGAGCTG 
AATTTCAAAG 
CGATTATNGG 
CGCAACCGAC 
AAAGAAGACC 
GCCGCGAGAT 
AACTGACTTT 
ACGGCTGAAG 
TTTCCGGCAA 
GCCAGCCTTT 
CTTATTTTCC 
TCCGCTCGAT 
TCGACCCGAA 
TTATGCAACG 
TGCACGCGGA 



TGGCTGGAAA 
CAGCGTGGCG 
ATTTTGGCGC 
GAAGGCGGGT 
CCAAAAGTCG 
AAGAACTGCA 
CTGGCGCCGC 
GCAGTCGGCA 
AATACACGGC 
ATCATCAACC 
TACCAACCTC 
CGAAAGGGTC 
TACAACAACG 
GCCCCGTGAA 
ACGAATACGG 
GGTCCGGATC 
CGATAAAGGG 
GCCAATATTT 
TACCGCTTAA 
AT AT ACGGGC 
AAAACTCGCC 
GAACCCGTAT 
CAGCATTAGT 
CGCGCACACA 
GGCGACTCCG 
GCAATTTGGC 
TATTAGGATT 
CACAACGTTT 
GGTCAGCAGC 
ACAAAGTGCA 
CGTTTTTTCA 
CAACTTCAGC 
AACTCAAACA 
TACGGACGTT 
GGGCGGCGCG 
AACGCTATAT 
CTGGGCAAGC 
GATTTTTGAT 
GCGCCGAAGT 
GCGGGCAATG 
AGACAAGGAC 
GCAAATACGG 
CNCACCTTTT 



GCGGAGCATC 
CAAAATTACC 
GGAATATCTG 
TGAAATTCAA 
TACTGGAAAA 
AAAATACATC 
AATACGACAT 
GGCAACCTGT 
GCAATTTCGC 
GCAATTATCA 
AATCTGACCG 
GAAGTTTACA 
CAAAAATCCT 
ACCGAGTTGC 
CAAAAACCGC 
ANGACAACGG 
CTGCTGCCCC 
CAACACGTTC 
ACTACAGCAC 
TATTACNGCT 
G AC AT AC AN G 
TGAAAAAATA 
GCGGACTTCG 
CCGTATGCCC 
GCGTTCACAC 
TTCAATACCT 
AAAACTGGTC 
ACGGGAAATG 
ACCGGGCTTG 
CAAACACGGT 
CCAACCTTTC 
GATGCGAGCG 
AGGTTATGGG 
TGGAAGTCGG 
ATGCGCTATT 
CGACGNCACC 
GTTCCATCAN 
TTNTACGCCG 
CAAAAATCTG 
ATGCGGCAAC 
GAAGAAGTAA 
CGGCACAAGC 
TGATAACGAT 



45 This encodes a protein having (partial) amino acid sequence <SEQ ID 880>: 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



KDKKVFTDAR 
RXDSGFGRVN 
VKGSFSGSAG 
NAMAAIGARK 
ERRKQRYFEQ 
EGHDKSWREN 
DLNTKIGSRK 
GWGLXKDFET 
FPEELGLFFD 
YFDAALKKDI 
KHCNQSCGIY 
NIQEMYFSQI 
GYRSRIDXYI 
FELELNYDYX 
LSRVSALPRD 
NGXXTSNFRQ 
FDRRYIDPLD 
KSVLTNFARG 



AVSTRQDIFK 
TMVDGITXTF 
INSLAGSANL 
WLESGASVGV 
EGGLKFNSNS 
LAPQYDITPI 
IINRNYQFNY 
YNNAKILDLX 
GPDXDNGLYS 
YRLNYSTNTV 
EPVLKKYGKK 
GDSGVHTALK 
HNVYGKWWDL 
RFFTNLSYAY 
YGRLEVGTRW 
LGKRSIXQTE 
AGNDAATQRY 
XTFLITMSYK 



SXENLDNIVR 
YSTSTDAGRA 
RTLXVDDVVQ 
LYGHSRRSVA 
GKWERDFQKS 
DPSSLKXQSA 
GLSLNPYTNL 
NTSTFRLPRE 
YLGRFKGDKG 
GYRFGGXYTG 
RANNHSVSIS 
PERANTWQFG 
NGNIPSWVSS 
QKSTQPTNFS 
LGNKLTLGGA 
TLARQPLIFD 
YSSFDPKDKD 
F* 



XIPGAFTXQX 
GGSSQFGASV 
GNXTYGLLLK 
QNYRVGGGGQ 
YWKTKWYQKY 
GNLFKLEYDG 
NLTAAYNSGR 
TELQTTLGFN 
LLPQKSTIVQ 
YYXSDDE FKR 
ADFGDYFMPF 
FNTYKKGLLK 
TGLAYTIQHR 
DASESPNNAS 
MRYFGKSIRA 
XYAAYEPKKX 
EEVTCNDDNT 



KSSGXVSLNI 
DSNFXAGLDV 
GLTGTNSTKG 
HIGNFGAEYL 
DAPQELQKYI 
VFNKYTAQFR 
QKYPKGSKFT 
YFHNEYGKNR 
PAGSQYFNTF 
AFGENSPTYX 
ASYSRTHRMP 
QDDILGLKLV 
NFKDKVHKHG 
KEDQLKQGYG 
TAEERYIDXT 
LIFRAEVKNL 
LCNGKYGGTS 



ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 



65 



orf 133a . pep 



10 20 30 40 

KDKKVFTDARAVSTRQDI FKSXENLDNIVRXI PGAFTXQXKS 
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I I M I M II M II I I I M I I I M II II II I I I I I I I II 
orf 133-1 E AQI QVLEDVHVKAKRVPKDKKVFT DARAVSTRQ D I FKS SENL DN I VRS I PGAFTQQDKS 

10 20 30 40 50 60 

50 60 70 80 90 100 

orf 133a . pep SGXVSLNIRXDSGFGRWTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDVVK 
f I [Mill I I I I I M I I I M f I I II I I I I I I I M I I I I II I I M I I I I I 1 II M M 
orf 13 3-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 

70 80 90 100 110 120 



110 120 130 140 150 160 

orf 133a. pep GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
M M I I I It M I M M I I It I MINIM M M I M t M M I M I II M I I M It M I 
orf 133-1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
15 130 140 150 160 170 180 

170 180 190 200 210 220 

orf 133a. pep ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 

I I M M I I I It II I M I M I I M II I M M I M M I M M I I M M I I I : I I M t : I M 
20 orf 133-1 E S GAS V G V L YG H S RR S V AQN YR VG GGG Q H I GN FG AE Y LE RRKQR Y FV QE G ALK FN S D S GK 

190 200 210 220 230 240 

230 240 250 260 270 280 

orf 133a . pep WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 

^ 25 I M I : I : : I I I I : : I : I M I I M I I I M I II I I M 1 I I I I 1 I I I M t 1 I t I 1 

orf 133-1 WERDLQRQQWKYKP YKN YNN - QELQKY IEEHDKS WREN LXPQ YD I T P I D P S SLKQQS AGN 

y 250 260 270 280 290 

H 290 300 310 320 330 340 

W 30 orf 13 3a. pep LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

IJl M I I I M I M I M I I t I I M I M M I I M I M M It II I I I I M I I I M I 1 I I M M I I I 

^ orf 133-1 LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

300 310 320 330 340 350 

=; 35 350 360 370 380 390 400 

=3 orf 133a . pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 

~'~ I 1 I I I 1 M I ! t 1 M I I I I I I M ) I I ) I I : II M I I I I I I M M I I I I I I I I I I 1 I II I 

; r= orf 133-1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 

U 360 370 380 390 400 410 

O 40 

;Z 410 420 430 440 450 460 

•=y orf 133a . pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 

y II I M II II II I II I M I I I l I M I I I II I I II I M I I I I II I I II M 1 I II I I I 11 II 

orf 133-1 EELGLFFDGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 

45 420 430 440 450 460 470 

470 480 490 500 510 520 

orf 133a. pep LNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGIYEPVLKKYGKKRA 

II I I II I M I II I I I I I I I I II I I II II I I! I I II 1 I M : I I I I I I I I I I I I I I I I I 
50 orf 133-1 LN YS TN T VG YR FGGE YT G Y YG S DDE FKRA FGENS PT YKKH CNRS CG I YE PVLKKYGKKRA 

480 490 500 510 520 530 

530 540 550 560 570 580 

orf 133a. pep NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
55 I I I I I M II I I M I I I I II I M I I I I I I II I II I I I I I I I I I I I I II II I I I I I I I I I I I 

orf 133-1 NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

60 orf 133a. pep TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 

I I II I I I I I II I I I I I I M II I I I I I I I I I M I I I I M : M I I I M I I II II I I M I I 
orf 133-1 TYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 
600 610 620 630 640 650 

65 650 660 670 680 690 700 

orf 133a. pep KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
M I I I M I I M I M I II II I II II II I I I M I I I II I II I I II I I II II I I M I M I II 
orf 133-1 KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
660 670 680 690 700 710 

70 
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10 



15 



20 



710 720 730 740 750 760 

orfl33a pep RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 
| M I I M I I I I I I M M I I I M I I it I I I I I I I I 1 I I I M I I I I M III M I I II I I 
orf 133-1 RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSNFRQLG 
720 730 740 750 760 770 

770 780 790 800 810 820 

orf 133a. pep KRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
M I I I I I 1 I M II M II MINIM 1 M II I II M II M I II I M I M II II I I I M 
orf 133-1 KR S I KQTE T L ARQ P L I F D F Y AAYE P KKN L I FRAE VKN L F D RR Y I D P L D AGN D AAT QR Y Y S 

780 790 800 810 820 830 

830 840 850 860 870 

orf 133a. pep SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 
I M 11 M I I : 1 I I I I : I I II I I M II I II M M M II II 1 M II II II 
orf 133-1 S FD PKDKDE DVT CN ADKT LCNGKYGGT S KS VLTN FARGRT FLMTMS YKFX 

840 850 860 870 880 

Homology with a predicted ORF from N. gonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from K 
gonorrhoeae: 



25 



30 



35 



40 



45 



50 



orf 133 .pep 
orf 133ng 
orf 133 .pep 
orf 133ng 
orf 133 .pep 
orf 133ng 
orf 133. pep 
orf 133ng 
orf 133 .pep 
orf 133ng 
orf 133 . pep 
orf 133ng 
orf 133 . pep 
orf 133ng 
orf 133 . pep 
orf 133ng 



PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 31 
I M I I : : I I I I I I I I M I : I : M : MM 

FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAFGENSPAYKEHCDPSCGL 560 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 91 
M I I I I II I II II I I I I I I 1 M I I I I M II I I M II I I II I M II M I II I M I M 1 M I 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 620 

KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 151 

I M II I II I I II II I I II II I I I I II M I M M M M I M I I M I I I I I II II II M : 
KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGPCWWDLNGDIPSWVG 680 

STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 211 

II M I I I I : I I I MM: M M M M I I M II I M M M II I M I M M M 
STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFS DASESPNNA 740 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 271 

M M I II II I II II M I II II M II M M II M M I M II M M M M II M I M M M I 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 800 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYE PKKNLIFRAEVKNLFDRRYIDPL 331 
M M II M M I II M I I I II I II I I I II M I II II I II II M I II II I M II M II I 

TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPL 860 

DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 3 91 

M M I M : : M M M M M M II II M M M M M M M M II M M M M I M M II I 

DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 



55 



60 



KF 393 
I I 

KF 922 

The complete length ORF133ng nucleotide sequence <SEQ ID 881 > is predicted to encode a 
protein having amino acid sequence <SEQ ID 882>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLLNLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

4 01 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 
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451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

7 51 GLSRVSALPR DYGRLEVGTR WLGNK LTLGG AMRYFGKS IR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883> 

1 ATGAGATCTT CTTTCCGGTT GAAGCCGATT TGTTTTTATC TTATGGGTGT 

51 TATGCTATAT CATCATAGTT ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 

101 AGGCGCAGAT ACAGGTTTTG GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 

151 CCGAAAGACA AAAAAGTGTT TACCGATGCG CGTGCCGTAT CGACCCGTca 

201 gGATGTGTTC AAATCCGGCG AAAACCTCGA CAACATCGTA CGCAGCATAC 

251 CCGGTGCGTT TACACAGCAA GATAAAAGCT CGGGCATTGT GTCTTTGAAT 

301 ATTCGCGGCG ACAGCGGGTT CGGGCGGGTC AATACGATGG TGGACGGCAT 

351 CACGCAGACC TTTTATTCGA CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 

4 01 CATCTCAATT CGGTGCATCT GTCGACAGCA ATTTTATTGC CGGACTGGAT 

4 51 GTCGTCAAAG GCAGCTTCAG CGGCTCGGCA GGCATCAACA GCCTTGCCGG 

501 TTCGGCGAAT CTGCGGACTT TAGGCGTGGA TGACGTCGTT CAGGGCAATA 

551 ATACCTACGG CCTGCTGCTA AAAGGTCTGA CCGGCACCAA TTCAACCAAA 

601 GGTAATGCGA TGGCGGCGAT AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 

651 GTCTGTCGGT GTGCTTTACG GGCACAGCAG GCGCGGCGTG GCGCAAAATT 

7 01 ACCGCGTGGG CGGCGGCGGG CAGCACATCG GAAATTTTGG TGAAGAATAT 

7 51 CTGGAACGGC GCAAACAGCA ATATTTTGTA CAAGAGGGTG GTTTGAAATT 

801 CAATGCCGGC AGCGGAAAAT GGGAACGGGA TTTGCAAAGG CAATACTGGA 

851 AAACAAAGTG GTATAAAAAA TACGAAGACC CCCAAGAACT GCAAAAATAC 

901 ATCGAAGAGC ATGATAAAAG CTGGCGGGAA AACCTGGCGC CGCAATACGA 

951 CATCACCCCC ATCGATCCGT CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 

1001 TGTTTAAATT GGAATACGAC GGCGTATTCA ATAAATACAC GGCGCAATTT 

1051 CGCGATTTAA ACACCAGAAT CGGCAGCCGC AAAATCATCA ACCGCAATTA 

1101 TCAATTCAAT TACGGTTTGT CTTTGAACCC GTATACCAAC CTCAATCTGA 

1151 CCGCAGCCTA CAATTCGGGC AGGCAGAAAT ATCCGAAAGG GGCGAAGTTT 

1201 ACAGGCTGGG GGCTTTTAAA AGATTTTGAA ACCTACAACA ACGCGAAAAT 

1251 CCTCGACCTC AACAACACCG CCACCTTCCG GCTGCCCCGC GAAACCGAGT 

1301 TGCAAACCAC TTTGGGCTTC AATTATTTCC ACAACGAATA CGGCAAAAAC 

1351 CGCTTTCCTG AAGAATTGGG GCTGTTTTTC GACGGTCCTG ATCAGGACAA 

14 01 CGGGCTTTAT TCCTATTTGG GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 

1451 CTCAAAAATC AACCATTGTC CAACCGGCCG GCAGCCAATA TTTCAACACG 

1501 TTCTACTTCG ATGCCGCGCT CAAAAAAGAC ATTTACCGCT TAAACTACAG 

1551 CACCAATGCA ATCAACTACC GTTTCGGCGG CGAATATACG GGCTATTACG 

1601 GCTCGGAAAA CGAATTTAAG CGGGCATTCG GAGAAAACTC GCCGGCATAC 

1651 AAGGAACATT GCGACCCGAG CTGCGGGCTT TATGAACCCG TATTGAAAAA 

17 01 ATACGGCAAA AAGCGCGCCA ACAACCATTC GGTCAGCATT AGTGCGGACT 

17 51 TCGGCGATTA TTTCATGCCG TTCGCCGGCT ATTCGCGCAC ACACCGTATG 

1801 CCCAACATCC AAGAAATGTA TTTTTCCCAA ATCGGCGACT CCGGCGTTCA 

1851 CACCGCCTTA AAACCAGAGC GCGCAAACAC TTGGCAATTT GGCTTCAATA 

1901 C C T AT AAAAA AGGATTGTTA AAACAAGATG ATATATTAGG ATTGAAACTG 

1951 GTCGGCTACC GCAGCCGCAT TGACAACTAC ATCCACAACG TTTACGGGAA 

2001 ATGGTGGGAT TTGAACGGGG ATATTCCGAG CTGGGTCGGC AGCACCGGGC 

2051 TTGCCTACAC CATCCGACAC CGCAATTTCA AAGACAAAGT GCACAAACAC 

2101 GGTTTTGAGC TGGAGCTGAA TTACGATTAT GGGCGTTTTT TCACCAACCT 

2151 TTCTTACGCC TATCAAAAAA GCACGCAACC GACCAATTTC AGCGATGCGA 

2201 GCGAATCGCC CAACAATGCC tccaaAGAAG ACCAACTCAA ACAAGGTTAT 

2251 GGGCTGAGCA GGGTTTCCGC CCTGCCGCGA GATTACGGAC GTTTGGAAGT 

2301 CGGTACGCGC TGGTTGGGCA ACAAACTGAC TTTGGGCGGC GCGAtgcGCT 

2351 ATTTCGGCAA GAGCATCCGC GCGACGGCTG AAGAACGCTA TATCGACGGC 

24 01 ACCAACGGGG GAAATACCAG CAATGTCCGG CAACTGGGCA AGCGTTCCAT 

2 4 51 CAAACAAACC GAAACCCTTG CCCGACAGCC TTTGATTTTT GATTTTTACG 

2501 CCGCTTACGA GCCGAAGAAA AACCTTATTT TCCGCGCCGA AG TCAAAAAC 

2551 CTGTTCGACA GGCGTTATAT CGATCCGCTC GATGCGGGCA ATGATGCGGC 

2 601 AACGCAGCGT TATTACAGCT CGTTCGACCC GAAAGACAAG GACGAAGACG 

2 651 TAACGTGTAA TGCTGATAAA ACGTTGTGCA ACGGCAAATA CGGCGGCACA 

2701 AGCAAAAGCG TATTGACCAA TTTCGCACGC GGACGCACCT TCTTGATGAC 

2751 GATGAGCTAC AAGTTTTAA 
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This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 



10 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR 



PKDKKVFTDA 
IRGDSGFGRV 
WKGSFSGSA 
GNAMAAIGAR 
LERRKQQYFV 
IEEHDKSWRE 
RDLNTRIGSR 
TGWGLLKDFE 
RFPEELGLFF 
FYFDAALKKD 
KEHCDPSCGL 
PNIQEMYFSQ 
VGYRSRIDNY 
GFELELNYDY 
GLSRVSALPR 
TNGGNTSNVR 
LFDRRYIDPL 
SKSVLTNFAR 



RAVSTRQDVF 
NTMVDGITQT 
GIN SLAG SAN 
KWLESGASVG 
QEGGLKFNAG 
NLAPQYDITP 
KIINRNYQFN 
TYNNAKILDL 
DGPDQDNGLY 
IYRLNYSTNA 
YEPVLKKYGK 
IGDSGVHTAL 
IHNVYGKWWD 
GRFFTNLSYA 
DYGRLEVGTR 
QLGKRSIKQT 
DAGNDAATQR 
GRTFLMTMSY 



KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNTATFRLPR 
SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVSI 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNKLTLGG 
ETLARQPLIF 
YYSSFDPKDK 
KF* 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLFKLEYD 
LNLTAAYNSG 
ETELQTTLGF 
GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKSIR 
DFYAAYEPKK 
DEDVTCNADK 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
QPAGSQYFNT 
RAFGENSPAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



ORF133ng-l and ORF 133-1 show 96.2% identity in 889 aa overlap: 



10 20 30 40 50 60 

orfl33ng-l.pep SFRLKPICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

! I I i ! ! ! I I ! I I i I I I! I I i I I I ! I 

25 orfl33-l EAQIQVLE DVH VKAKRVPKDKKVFT DARAV 

10 20 30 



70 80 90 100 110 120 

orf 133ng-l . pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
30 M M I : II I : I I I I I I I I I M I I I I II I II I I M I II I I I II M I I M M I I I M I I I I I 

orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 



130 140 150 160 170 180 

35 orf 133ng-l .pep TSTDAGRAGGSSQFGASVDSNFIAGLDVVKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

I I M II M I I I I I I M M I I I I I I II I II I I I II I I I I I M II I I M I I I II M II M I I 
orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDVVQGN 

100 110 120 130 140 150 



40 190 200 210 220 230 240 

orf 133ng-l . pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 
I I I I M I I I I I I I I I M I I M i I I I I I I I ! I I I I I I I I N I I !! I M I ! ! II I I I I ! M 
orf 133-1 NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 

160 170 180 190 200 210 

45 

250 260 270 280 290 300 

orf 133ng-l . pep GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 
MM I I I I I I I I : I i I I I I : I I I I : I I M M I II M II I M : I : : II II II I I I 
orf 133-1 GNFGAEYLERRKQRYFVQEGALKFNSDSGBCWERDLQRQQWKYKPYKNYNN-QELQKYIEE 
50 220 230 240 250 260 



310 320 330 340 350 360 

orf 133ng-l . pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 
M I II M M M I M I II M I M I I I M M I M M II I M M I M I M M II M M I I I I 
55 orf 133-1 HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 

270 280 290 300 310 320 



370 380 390 400 410 420 

orf 133ng-l . pep NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 
60 I I M M I M I M I M I M M II I M M M I M II M It M M I I M M M M I I M M I I 

orf 133-1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 
330 340 350 360 370 380 



430 440 450 4 60 470 480 

65 orfl33ng-l .pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

I M I M M I I I M I II II II I M II I I M M II II M I II I I If II II I M I M II I I M 
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orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
390 400 410 420 430 440 

490 500 510 520 530 540 

5 orf 133ng-l . pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 

M I I I I ( M I ( I I M I ! I f I I I I II I I I I If I M i I : : : I I M M M I M II : : I II I M 
orf 133-1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 

10 550 560 570 580 590 600 

orfl33ng-l.pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 
MM!:!!:)!: I I I : I I I 1 M 11 I I II M II I I 1 I ! I) I I I I I I ! II : i I M II I I M 
orf 13 3-1 GENSPTYKKHCNRSCGIYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNI 
510 520 530 540 550 560 

15 

610 620 630 640 650 660 

orf 133ng-l . pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
I I M I M M I I I II f If II M M I M I M II M I I I I I I I I I I II I I I M I I I I II I I I 
orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 
20 570 580 590 600 610 620 

670 680 690 700 710 720 

orfl33ng-l.pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
I I I I I I I I I I I I I M I : I II I I I I I : I I I I I I I I I I I I M I I I II I I ! I M I M I I I I I I 
25 orf 133-1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 

630 640 650 660 670 680 

730 740 750 760 770 780 

orf 133ng-l . pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
30 I I I M I I M I ( II I M I II I I I I I I M I I II II I I I I II I I I I I II I I I I II I M I CM I 

orf 133-1 STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
690 700 710 720 730 740 

790 800 810 820 830 840 

35 orf 133ng-l .pep YFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 

I M I I II II I II I II M I II I M I I I I I I I I M I I M I M I II II II M I II I M II II 
orf 133-1 YFGKSIRATAEERYIDGTNGGNTSNFRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
750 760 770 780 790 800 

40 850 860 870 880 890 900 

orf 133ng-l . pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
I I I I I I I I 1 I I I I I I I I I I 1 I I I I I I I I I M I I I I I I I I I I I I I ', I I I I I i I I I i I I I I I 

orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
810 820 830 840 850 860 

45 

910 920 
orf 133ng-l .pep VLTN FARGRT FLMTMS YKFX 
I I I I II II I I I II II M I I I 
orf 133-1 VLTN FARGRT FLMTMS YKFX 

50 870 880 

In addition, ORF133ng-l is homologous to a TonB-dependent receptor in H. influenzae: 

spl P45114 | YC17_RAEIN PROBABLE TONB- DEPENDENT RECEPTOR HI1217 PRECURSOR 
>gi 1 1075372 | pir M G64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae {strain Rd KW20) >gi 1 1574147 (U32801) transferrin binding 
55 protein 1 precursor (tbpl) [Haemophilus influenzae] Length = 913 

Score = 930 bits (2377), Expect = 0.0 

Identities = 476/921 (51%), Positives = 619/921 (66%), Gaps = 72/921 (7%) 

„. . , QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIV 97 

60 + L + V K + DKK FT+A+A STR++VFK + +D ++RS I PGAFTQQDK SG+V 

ETLGQI DWEKVI SNDKKPFTEAKAKSTRENVFKETQT I DQVIRS I PGAFTQQDKGSG VV 88 



Query: 


38 


Sbjct : 


29 


Query: 


98 


Sbjct: 


89 


Query: 


15 



SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDVVKGSFS 157 
S+NIRG++G GRVNTMVDG+TQTFYST+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
65 Sbjct: 89 SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 148 

GSAGINSLAGSANLRTLGVDDWQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLESGA 217 
G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 
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Sbjct: 149 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

Query: 218 SVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 

VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 
Sbjct: 209 YVGWYGYSQREVSQDYRI-GGGERLASLGQDILAKEKEAYF-RNAGYILNP-EGQWTPD 265 

Query: 278 LQRQYWK TKWY KKYEDPQELQK YIEE 303 

L +++W +Y KK +D ++LQK IEE 

Sbjct: 266 LSKKHWSCNKPDYQKNGDCSYYRIGSAAKTRREILQELLTNGKKPKDIEKLQKGNDGIEE 325 

Query: 304 HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLlsiTRIGSRKII 363 

DKS+ N QY + PI + P L+ +S +L K EY AQ R L+ +IGSRKI 

Sbjct: 326 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDNKIGSRKIE 384 

15 Query: 364 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 

NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

Sbjct: 385 NRNYQVNYNFNNNSYLDLNLMAAHNIGKTIYPKGGFFAGWQVADKLITKNVANIVDINNS 444 

Query: 424 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSY — LGRFKGDKG 481 
20 TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ D GLYS+ GR+ G K 

Sbjct: 445 HTFLLPKEIDLKTTLGFNYFTNEYSKNRFPEELSLFYNDASHDQGLYSHSKRGRYSGTKS 504 

Query: 482 LLPQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKR 541 
LLPQ+S I+QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 
25 Sbjct: 505 LLPQRSVILQPSGKQKFKTVYFDTALSKGIYHLNYSVNFTHYAFNGEYVGY 555 



30 



50 



Query: 542 AFGENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMP 601 

EN+ + + EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 

Sbjct: 556 ENTAGQQ INEPILHKSGHKBCAFNHSATLSAELSDYFMPFFTYSRTHRMP 604 

Query: 602 NIQEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYI 661 

NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NYI 

Sbjct: 605 NIQEMFFSQVSNAGVNTALKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYI 664 



35 Query: 662 HNVYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAY 721 

HNVYG WW +P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 

Sbjct: 665 HNVYGVWW — RDGMPTWAESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

Query: 722 QKSTQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGA 781 
40 Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 

Sbjct: 723 QRTNQPTNYADASPRPNNASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLA 782 

Query: 782 MRYFGKS IRATAEERYI DGTNGGNT SNVRQLGKRS IKQTETLARQPL I FDFYAAYE PKKN 841 
RY+GKS RAT EE YI+G+ + +R+ ++K+TE + +QP+I D + 4-YEP K+ 

45 Sbjct: 783 ARYYGKSKRATIEEEYINGSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKD 841 

Query: 842 LIFRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTS 901 

LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS + + C D + C GG+ 

Sbjct: 842 LIIKAEVQNLLDKRYVDPLDAGNDAASQRYYSSL NNS IECAQDSSAC GGSD 892 



Query: 902 KSVLTNFARGRTFLMTMSYKF 922 

K+VL NFARGRT++++++YKF 
Sbjct: 893 KTVLYNFARGRTYILSLNYKF 913 



The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
55 predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 104 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 885> 

60 1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 



CHIR-0160 (356.001) 



-505- 



PATENT 



51 TTACGCGCTC 

101 ACGAAACCGG 

151 GGCTACACCG 

201 CGCCGTCCTT 

251 GCGAACTGAC 

301 TTGATTCTGT 

351 CGGCGAATGG 

4 01 CCGCCGCCAT 

451 AAAGAAAAAA 



CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CGCAGTTCGG 
GTTGCGCCCA 
CAACGGCAAA 
ACAGCGTGAT 



TCGCTTTGTA 
AAAGGCAGTT 
GCCCGCCCGC 
TGGTCTCCCT 
GCCAGCGGCA 
TTTTATTTTT 
CACTGAGCCA 
ATCAGCACCG 
CAATGTGCGC 



CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCA 
AAAAGCCGAA 
GCAATACCGG 
GAAATGTTGC 



GAAATCCTGT 
GGAAATGCTG 
TGATTCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAAG 
CCTTTGGCTG 
CCGACCAT . . 



10 This corresponds to the amino acid sequence <SEQ ED 886; ORF1 12>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH. . , 

1 5 Further work revealed further partal nucleotide sequence <SEQ ID 887>: 



20 



25 



30 



35 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
gGCTACACCG 
CGCCGTCCTT 
GCGAACTGAC 
TTGATTCTGT 
CGGCGAATGG 
CCGCCGCCAT 
AAAGAAAAAA 
GCTTTTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
TATTGCGGCT 
ACGTATTGCT 
TACATCCGCC 
CGCATGGTGG 
TCGTCGCCTT 
TTAAAACTCT 
ACGGCTCTTT 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CGCAGTTCGG 
GTTGCGCCCA 
CAACGGCAAA 
ACAGCrTkAT 
ATCAAAATTT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAAACT 
CGTCAAACCC 
ACCTCCAAAA 
CGCAAATTGG 
TGCCTTTACC 
TCGGCGGCAT 
GGGTTTACCA 



CATCATCCGT 
TCGCTTTGTA 
AAAGGCAGTT 
GCCCGCCCGC 
TGGTCTCCCT 
GCCAGCGGCA 
TTTTATTTTT 
CACTGAGCCA 
ATCAGCACCG 
CAATGTGCGC 
GGGCGCGCAA 
GCCGTTTTGA 
CACGCTTGGC 
GGCCGATTTC 
GACCAAATGT 
CAACAGCCAA 
TTTACCCCGC 
CCGCAAACCA 
CTGTsTCGGA 
GCCAACTCGG . 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCA 
AAAAGCCGAA 
GCAATACCGG 
GAAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
GAAGACAAAG 
CGTCAAACGC 
CCGTCGGCGA 
AACACCCGAA 
CGCAGCCTGG 
CCCGCCACGG 
TTGCTGTTCC 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGCTG 
TGATTCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAAG 
CCTTTGGCTG 
CCGACCATAC 
GAATTGGCAG 
CAGTTGGCAG 
TCGAGGTCTC 
AACCTGATGG 
ACTGACCACC 
TCTACGCCAT 
GTGATGGCGC 
CAATATG GGC 
ACCTTGCCGG 



40 



This corresponds to the amino acid sequence <SEQ ID 888; ORF112-l>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICXG LLFHLAGRLF GFTSQL. . . 



Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
45 following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 12 shows 96.4% identity over a 166aa overlap with an ORF (ORF1 12a) from strain A of TV. 
meningitidis: 



50 



55 



10 20 30 40 50 60 

orf 112 . pep MNL I S R Y 1 1 RQMAVMAV YALLAFLAL YS FFE I L YETGNLGKG S YG I WEMLG YTALKMPAR 
I M I I I I I I I I I I I I I I M I I M I I I I I I I I I M M I I I 1 I M 1 I M I I I I I I M I II 
orf 112a MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 112 .pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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||||:M!tlllMI! IIMMIM:!MIIIM!M!M!)M!1MM!]IIMMI 
orfll2a AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 

orfll2 pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 
I | ( M II I I t I I I I I I M I M M I M I I I I I M I I : I I I I I I I I M 
orfll2a VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 
130 140 150 160 170 180 

orfll2a ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 



The ORF1 12a nucleotide sequence <SEQ ID 889> is: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

15 51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGNTG 

151 GGNTACACCG CCCTCAAAAT GNCCGCCCGC GCCTACGAAC TGATGCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCTNT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAN CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

20 301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCGGCCAT CAACGGCAAA ATCAGTACCG GCAATACCGG CCTTTGGCTG 

4 51 AAAGAAAAAA ACAGCATTAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 CCTGCTGGGC ATTAAAATCT GGGCCCGCAA CGATAAAAAC GAACTGGCAG 

25 551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAANT GGCCGATTTC CGTCAAACGC AACCTGATGG 

7 01 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAN NNACAGCCAA AACACCCGAA TCTACGCCAT 

30 801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAANTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 NCGGCTCTTC NGGTTTACCA GCCAACTCTA CGGCATCCCG CCCTTCCTCG 

1001 NCGGCGCACT ACCTACCATA GCCTTCGCCT TGCTCGCCGT TTGGCTGATA 

35 1051 CGCAAACAGG AAAAACGCTA A 

This encodes a protein having the amino acid sequence <SEQ ID 890>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 

51 GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSELXVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

40 151 KEKNSIINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQXXSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKXFGGICLG LLFHLA GRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 

351 RKQEKR* 

45 ORF1 12a and ORF1 12-1 show 96.3% identity in 326 aa overlap: 

orf 112a . pep MNLISRYI I RQMAVMAVYALLAFLALYSFFE I LYETGNLGKGSYGIWEMX GYTALKMXAR 
M I I I M M I I M M M I M I I M I I I I I M I M I M I I ! I M I M I I I ! M I I I ! I I 
orf 112-1 MNLISRYI IRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

50 orf 112a. pep AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

I II I : I I I II I I I ! M I I II II I I I : I I I M I II M M M M M I I I M I I I I M I I I I 
orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

orf 112a. pep VAPTLSQECAENIKAAAINGKISTGNTGLWLKEKNS I INVREMLPDHTLLG IKIWARNDKN 

55 M M I I I I I I 1 M I 1 II II II I I I I I I I I I I I I M I I I I I I I I II I I I II I I I M I M I 

orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

orf 112a. pep ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 

II M M II I M I M I M I I II I I I I I I I I I I I I I I I I 11 I II I II I I I M I M I M II I 
60 orf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

orf 112a . pep DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFT PQTTRHGNMG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I II I II M M I M 
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orf 112-1 DQM SVGELTTYIRH LQNN S QN T R I Y A I AW WRKLV Y P AAAW VMAL V A F A FT PQT T RHGNMG 

orfll2a pep LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

I I II I I I II 1 I I I ( M M I II I I 
orf 112-1 LKLFGG I CXGLLFHLAGRLFG FT SQL 

Homology with a predicted ORF from N. gonorrhoeae 

ORF112 shows 95.8% identity over 166aa overlap with a predicted ORF (ORF112ng) from N. 
gonorrhoeae: 

orf 112 pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

I | | | M I ! I M I I I II I II I I I ft I I t II I I I I I i I t II I II I i I I I M M M II I M I I 
orfll2ng MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

orf 112 .pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 120 

I I I ! : I I I I I I t ! I : I M I I I ! I ! I I : I I ! ! I ! M I I I I I IN ! M I I M I I I : I I I I ! ! 
orfll2ng AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 120 

orf 112 .pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 166 

M M I I I I M I II M I M I I I I I I I I I I I 1 M I : 1 : M I I Mill 
orfll2ng VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 180 

The complete length ORF1 12ng nucleotide sequence <SEQ ID 891> is: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGC CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TCATGCCCCT 

201 CGCCGTCCTC ATCGGCGGAC TGGCCTCTCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGGC CGT CATC AAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CTCAGTTCGG TTTTATTTTT GCTATTGCCG CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CGCTGAGCCA AAAAGCCGAA AACATCAAag 

4 01 cCGCCGCCAt taacggCAAA ATCAGCAccg gcAATACCGG CCTTTggcTG 

4 51 AAAGAAAAAa ccAGCATTAT CAATGTGcGc GGAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA C GAT AAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGCTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CATCATGGGT ACAGACAAAA TCGAAACATC 

651 cgCCGCCGCC GAAGAAACTT gGCCGATTGC CGTCAGACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAGCCC GACCAAATGT CCGTCGGCGA GCTGACCACC 

7 51 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCAAA TCTACGCCAT 

801 CGCATGGTGG CGTAAACTCG TTTACCCCGT CGCCGCATGG GTCATGGCGC 

851 TCGTTGCCTT CGCCTTTACG CCGCAAACCA CGCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 CAGGCTCTTC GGGTTTACCA GCCAACTCTA CGGCACCCCA CCCTTCCTCG 

1001 CCGGCGCACT GCCTACCATA GCCTTCGCCT TGCTCGCTGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGTTG A 

This encodes a protein having amino acid sequence <SEQ ID 892>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LILSQFGFIF AIAAV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTQIYAIAWW R KLVYPVAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICLG LLFHL AGRLF GFTSQLYGTP PFL AGALPTI AFALLAVWLI 

351 RKQEKR* 

ORF1 12ng and ORF1 12-1 show 94.2% identity in 326 aa overlap: 

10 20 30 40 50 60 

orfll2ng MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
I I I M II I I II I I I M M I I I I I II I I I M I I M I I M M I II I I I I I II I I I II II I I I 
orf 112-1 MN L I S R Y 1 1 RQMA VMA V YAL LA FLAL Y S F FE I L YETGN LGKG S YG I WEMLG YT ALKM P AR 

10 20 30 40 50 60 

70 80 90 100 110 120 
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AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 
I M I : M I M II I I : M I N I I I I II : I N I I I I I M M I 1 ! I I I M I II I II : 1 I 1 I I I 
AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
70 80 90 100 110 120 

130 140 150 160 170 180 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSII^?VRGMLPDHTLLGIKIWARNDKN 

I I I II I I I M I I I I ! I I I II II I I I I I I I I I M : I 11(1 I I I I II I M I I I I I M M I 
VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXItSIVREMLPDHTLLGIKIWARNDKN 
130 140 150 160 170 180 

190 200 210 220 230 240 

ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 

I I I M I I I I I I I I M I II I I I I II I II : I II : I : I I M I : I M : I : M I I I M I I I I 
ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 

250 260 270 280 290 300 

DQMSVGELTTYIRHLQNNSQNTQIYAIAWWRKLVYPVAAWVMALVAFAFTPQTTRHGNMG 



250 260 270 280 290 300 

310 320 330 340 350 

LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 



310 320 



30 This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 105 



Table III lists several Neisseria strains which were used to assess the conservation of the sequence 
of ORF 4 among different strains. 



35 TABLE III - List of Neisseria Strains Used for Gene Variability Study of ORF 4 



ORF4 gene variability: List of used Neisseria strains 


Identification Strains 


Source / reference 


number 








Group B 




zv01_4 


NG6/88 


R. Moxon / Seiler et al, 1996 


zv02_4 


BZ198 


R. Moxon / Seiler et al, 1996 


zv03_4ass 


NG3/88 


R. Moxon / Seiler et al, 1996 


zv04_4 


297-0 


R. Moxon / Seiler et al, 1996 


zv05_4 


1000 


R. Moxon / Seiler et al. 9 1996 


zv06_4 


BZ147 


R. Moxon / Seiler et al, 1996 


zv07_4 


BZ169 


R. Moxon / Seiler et al, 1996 


zv08_4 


528 


R. Moxon / Seiler et al, 1996 


zv09_4 


NGP165 


R, Moxon / Seiler et al, 1996 


zvl0_4 


BZ133 


R. Moxon / Seiler et al, 1996 



orf 112ng 
orfll2-l 

5 

orf 112ng 
orf 112-1 

10 

orfll2ng 
15 orfll2-l 

orfll2ng 

20 

orfll2-l 

25 orf!12ng 
orfll2-l 
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zvll_4 

zvl2_4ass 

zvl3_4 

zvl5_4 

zvl6_4 

zvl7_4 

zvl8_4 

zvl9_4 

zv20_4 

zv21_4 

zv96 4 



zv22_4 
z2491 4 



zv24_4 
zv25 4 



zv26_4ass 
zv27_4 
zv28_4 
zv29 4 



zv32_4 
zv33_4 

fal090 4 



NGE31 

NGF26 

NGE28 

SWZ107 

NGH15 

NGH36 

BZ232 

BZ83 

44/76 

MC58 

2996 

Group A 

205900 

Z2491 

Group C 

90/18311 
93/4286 

Others 

A22 (group W) 
E26 (group X) 
860800 (group Y) 
E32 (group Z) 

Gonococcus 

NgF62 
Ng SN4 

FA1090 



R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon / Seiler et al, 1996 
R. Moxon 
Our collection 



R. Moxon 

R. Moxon / Maiden et al, 1998 



R. Moxon 
R. Moxon 



R. Moxon / Maiden et al, 1998 
R. Moxon / Maiden et al, 1998 
R. Moxon / Maiden et al, 1998 
R. Moxon / Maiden et al, 1998 



R. Moxon / Maiden et al, 1998 
R. Moxon 

R. Moxon 



References: 

Seiler A. et al, Mol. Microbiol., 1996, 19(4):841-856. 
Maiden et al., Proc. Natl. Acad. Sci. USA, 1998, 95:3140-3145. 



The amino acid sequences for each listed strain are as follows: 

>FA1090_4 <SEQ ID 893> 

MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 



# # 
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KADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNW 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>Z2491 4 <SEQ ID 894> 
5 MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV01_4 <SEQ ID 895> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
15 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV02_4 <SEQ ID 896> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
20 HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAARNEGAAK* 

25 >ZV03__4ASS <SEQ ID 897> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 

3 0 AVKT ADKDS QWLKDVTEAYNSDAFKA YAHKRFEGYKS P AAWNEGAAK * 

>ZV04_4 <SEQ ID 898> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
35 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV05_4 <SEQ ID 899> 
40 MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

45 

>ZV06_4 <SEQ ID 900> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
50 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTAHKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV07_4 <SEQ ID 901> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
55 QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

60 >ZV08_4 <SEQ ID 902> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVELVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

65 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV09_4 <SEQ ID 902> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
70 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV10_4 <SEQ ID 903> 
75 NKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 
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>ZV11 4 <SEQ ID 904> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQVELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
5 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV12 4 ASS <SEQ ID 905> 
10 MKTFFKTLSAAALALILAACGGQKDRAPAASASAASENGAAKKEILFGTTVGDLGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV13_4 <SEQ ID 906> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
20 ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV15_4 <SEQ ID 907> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
25 HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAGNEGAAK* 

30 >ZV16_4 <SEQ ID 908> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 

35 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV17_4 <SEQ ID 909> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
40 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>2V18_4 <SEQ ID 910> 

45 MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 

^ A VKT A DK D S QWLK DVT E A YN S DA FKA Y AH KRFEGYKS P AAWNE GAAK * 

>ZV19_4 <SEQ ID 911> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVELVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
55 ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV20_4 <SEQ ID 912> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
60 QIQAELEKKGYTVELVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

65 >ZV21_4 <SEQ ID 913> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGW I KLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

70 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV22_4 <SEQ ID 914> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDLVKE 
QIQPELEKKGYTVELVEFTDYVRPNLALGEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
75 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
ADI AENLKN I KI VELEAAQLPRSRADVDFAWNGN YAI SSGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 



>ZV24_4ASS <SEQ ID 915> 
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MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVELVEFTDDVRPNLALGEGELDIIVFQHKPYLDDFKKEQNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV25 4 <SEQ ID 916> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV26_4 <SEQ ID 917> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV27_4 <SEQ ID 918> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQEPS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV28_4 <SEQ ID 919> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQEPS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV29_4 <SEQ ID 920> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQVELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQEPS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV32__4 <SEQ ID 921> 

MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
KADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQEPS FA YVNW 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>ZV33_4 <SEQ ID 922> 

MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
KADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQEPS FA YVNW 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>ZV96_4 <SEQ ID 923> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKN I KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQEPS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 



Figure 8 shows the results of aligning the sequences of each of these strains. Dark shading 
indicates regions of homology, and gray shading indicates the conservation of amino acids with 
similar characteristics. As is readily discernible, there is significant conservation among the 
various strains of ORF 4, further confirming its utility as an antigen for both vaccines and 
diagnostics. 



